{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cell-0",
   "metadata": {},
   "source": [
    "# Stepwise Group-Wise Regression — Input Preparation & Model Selection\n",
    "\n",
    "This notebook assembles the combined input table for stepwise group-wise\n",
    "regression by merging BEV metrics with the full ACS variable set at the\n",
    "Alabama ZIP code level, then runs forward groupwise selection across all\n",
    "combinations of grouping level and response mode.\n",
    "\n",
    "### Steps\n",
    "\n",
    "1. **Load** `ZipCodeComparisons.csv` — the outer-joined ACS demographics + EV\n",
    "   registration crosstab produced by `3-VehicleCountCrosstabulation.ipynb`.\n",
    "2. **Choose a minimum-size threshold** — explore the effect of excluding ZIP codes\n",
    "   where `Population` or `TotalVehicleCount` is missing or below a threshold,\n",
    "   then select the largest threshold that excludes no more than 1 % of total\n",
    "   population or total vehicles.\n",
    "3. **Compute BEV per Capita** — `BEV / Population`.\n",
    "4. **Select & rename** — keep only `ZipCode`, `BEV Count`, `BEV Proportion`,\n",
    "   and `BEV per Capita`.\n",
    "5. **Left join** with the ACS data from\n",
    "   `ACS_ALZipCodes.csv`, bringing in all ~909 ACS variables for\n",
    "   each retained ZIP code.\n",
    "6. **Compute derived ACS variables** — read `ACSEnhanced_Dictionary.csv` and\n",
    "   evaluate the ~3,194 cumulative and proportional formulas encoded in the\n",
    "   derived variable names, adding each as a new column.\n",
    "7. **Save** to `CombinedInput.csv`.\n",
    "8. **Forward Groupwise Stepwise Regression** — run for every combination of:\n",
    "   - **Grouping level**: macro (55 ACS table groups) or micro (865 sub-groups)\n",
    "   - **Response mode**: individual (one DV at a time) or aggregate (mean Adj. $R^2$ across all 3 DVs)\n",
    "\n",
    "   Each combination saves to its own CSV.  Existing files are skipped.\n",
    "\n",
    "### Inputs\n",
    "\n",
    "| File | Description |\n",
    "|---|---|\n",
    "| `ZipCodeComparisons.csv` | Outer-joined ACS demographics + EV registration crosstab |\n",
    "| `ACS_ALZipCodes.csv` | ACS 5-Year data (one row per ZIP code, one column per variable) |\n",
    "| `ACSEnhanced_Dictionary.csv` | Enhanced dictionary with derived variable formulas |\n",
    "\n",
    "### Outputs\n",
    "\n",
    "| File | Description |\n",
    "|---|---|\n",
    "| `CombinedInput.csv` | One row per ZIP code: BEV metrics + base ACS + derived ACS variables |\n",
    "| `Results_macro_individual.csv` | Macro groups, each DV separately |\n",
    "| `Results_macro_aggregate.csv` | Macro groups, mean Adj. $R^2$ across all DVs |\n",
    "| `Results_micro_individual.csv` | Micro sub-groups, each DV separately |\n",
    "| `Results_micro_aggregate.csv` | Micro sub-groups, mean Adj. $R^2$ across all DVs |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cell-1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import polars as pl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-2",
   "metadata": {},
   "source": [
    "## 1. Load ZipCodeComparisons\n",
    "\n",
    "Load the outer-joined table produced by `3-VehicleCountCrosstabulation.ipynb`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cell-3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Raw shape: (760, 7)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>Population</th><th>LandArea</th><th>PopulationDensity</th><th>BEV</th><th>TotalVehicleCount</th><th>BEVProportion</th></tr><tr><td>i64</td><td>f64</td><td>f64</td><td>f64</td><td>i64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>35004</td><td>12155.0</td><td>17.239</td><td>0.001418</td><td>55</td><td>11480</td><td>0.004791</td></tr><tr><td>35005</td><td>8247.0</td><td>31.747</td><td>0.00385</td><td>14</td><td>7500</td><td>0.001867</td></tr><tr><td>35006</td><td>3894.0</td><td>103.993</td><td>0.026706</td><td>4</td><td>3342</td><td>0.001197</td></tr><tr><td>35007</td><td>28586.0</td><td>39.048</td><td>0.001366</td><td>134</td><td>27636</td><td>0.004849</td></tr><tr><td>35010</td><td>19347.0</td><td>222.707</td><td>0.011511</td><td>44</td><td>20255</td><td>0.002172</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 7)\n",
       "┌─────────┬────────────┬──────────┬───────────────────┬─────┬───────────────────┬───────────────┐\n",
       "│ ZipCode ┆ Population ┆ LandArea ┆ PopulationDensity ┆ BEV ┆ TotalVehicleCount ┆ BEVProportion │\n",
       "│ ---     ┆ ---        ┆ ---      ┆ ---               ┆ --- ┆ ---               ┆ ---           │\n",
       "│ i64     ┆ f64        ┆ f64      ┆ f64               ┆ i64 ┆ i64               ┆ f64           │\n",
       "╞═════════╪════════════╪══════════╪═══════════════════╪═════╪═══════════════════╪═══════════════╡\n",
       "│ 35004   ┆ 12155.0    ┆ 17.239   ┆ 0.001418          ┆ 55  ┆ 11480             ┆ 0.004791      │\n",
       "│ 35005   ┆ 8247.0     ┆ 31.747   ┆ 0.00385           ┆ 14  ┆ 7500              ┆ 0.001867      │\n",
       "│ 35006   ┆ 3894.0     ┆ 103.993  ┆ 0.026706          ┆ 4   ┆ 3342              ┆ 0.001197      │\n",
       "│ 35007   ┆ 28586.0    ┆ 39.048   ┆ 0.001366          ┆ 134 ┆ 27636             ┆ 0.004849      │\n",
       "│ 35010   ┆ 19347.0    ┆ 222.707  ┆ 0.011511          ┆ 44  ┆ 20255             ┆ 0.002172      │\n",
       "└─────────┴────────────┴──────────┴───────────────────┴─────┴───────────────────┴───────────────┘"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "comparisons = pl.read_csv(\"ZipCodeComparisons.csv\")\n",
    "print(\"Raw shape:\", comparisons.shape)\n",
    "comparisons.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "986qh493gc4",
   "metadata": {},
   "source": [
    "## 2. Choose a minimum-size threshold\n",
    "\n",
    "A ZIP code is **excluded** when either `Population` or `TotalVehicleCount` is\n",
    "`null` or falls below a given threshold.  The table below shows, for a range\n",
    "of candidate thresholds, the percentage of total population and total vehicles\n",
    "that would be **excluded**.\n",
    "\n",
    "The threshold is chosen as the **largest value that excludes no more than 1 %**\n",
    "of either total population or total vehicles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "yudms5bncyp",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Coarse sweep (step 100):\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (11, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Threshold</th><th>ZIP Codes Kept</th><th>ZIP Codes Excluded</th><th>% Population Excluded</th><th>% Vehicles Excluded</th></tr><tr><td>i64</td><td>i64</td><td>i64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>0</td><td>653</td><td>107</td><td>0.0237</td><td>0.0425</td></tr><tr><td>100</td><td>587</td><td>173</td><td>0.3505</td><td>0.1123</td></tr><tr><td>200</td><td>577</td><td>183</td><td>0.4906</td><td>0.1535</td></tr><tr><td>300</td><td>568</td><td>192</td><td>0.5492</td><td>0.2182</td></tr><tr><td>400</td><td>560</td><td>200</td><td>0.6282</td><td>0.2945</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>600</td><td>541</td><td>219</td><td>0.8424</td><td>0.6033</td></tr><tr><td>700</td><td>527</td><td>233</td><td>1.036</td><td>0.8503</td></tr><tr><td>800</td><td>515</td><td>245</td><td>1.2337</td><td>1.0816</td></tr><tr><td>900</td><td>504</td><td>256</td><td>1.4295</td><td>1.3126</td></tr><tr><td>1000</td><td>497</td><td>263</td><td>1.5642</td><td>1.4696</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (11, 5)\n",
       "┌───────────┬────────────────┬────────────────────┬───────────────────────┬─────────────────────┐\n",
       "│ Threshold ┆ ZIP Codes Kept ┆ ZIP Codes Excluded ┆ % Population Excluded ┆ % Vehicles Excluded │\n",
       "│ ---       ┆ ---            ┆ ---                ┆ ---                   ┆ ---                 │\n",
       "│ i64       ┆ i64            ┆ i64                ┆ f64                   ┆ f64                 │\n",
       "╞═══════════╪════════════════╪════════════════════╪═══════════════════════╪═════════════════════╡\n",
       "│ 0         ┆ 653            ┆ 107                ┆ 0.0237                ┆ 0.0425              │\n",
       "│ 100       ┆ 587            ┆ 173                ┆ 0.3505                ┆ 0.1123              │\n",
       "│ 200       ┆ 577            ┆ 183                ┆ 0.4906                ┆ 0.1535              │\n",
       "│ 300       ┆ 568            ┆ 192                ┆ 0.5492                ┆ 0.2182              │\n",
       "│ 400       ┆ 560            ┆ 200                ┆ 0.6282                ┆ 0.2945              │\n",
       "│ …         ┆ …              ┆ …                  ┆ …                     ┆ …                   │\n",
       "│ 600       ┆ 541            ┆ 219                ┆ 0.8424                ┆ 0.6033              │\n",
       "│ 700       ┆ 527            ┆ 233                ┆ 1.036                 ┆ 0.8503              │\n",
       "│ 800       ┆ 515            ┆ 245                ┆ 1.2337                ┆ 1.0816              │\n",
       "│ 900       ┆ 504            ┆ 256                ┆ 1.4295                ┆ 1.3126              │\n",
       "│ 1000      ┆ 497            ┆ 263                ┆ 1.5642                ┆ 1.4696              │\n",
       "└───────────┴────────────────┴────────────────────┴───────────────────────┴─────────────────────┘"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "MAX_EXCLUSION = 0.01  # no more than 1 % of people or vehicles excluded\n",
    "\n",
    "# Universe totals (across all non-null rows for each measure)\n",
    "total_pop = comparisons.select(pl.col(\"Population\").sum()).item()\n",
    "total_veh = comparisons.select(pl.col(\"TotalVehicleCount\").sum()).item()\n",
    "total_zips = comparisons.height\n",
    "\n",
    "def threshold_sweep(thresholds):\n",
    "    \"\"\"Compute exclusion stats for a list of threshold values.\"\"\"\n",
    "    rows = []\n",
    "    for t in thresholds:\n",
    "        kept = comparisons.filter(\n",
    "            (pl.col(\"Population\").is_not_null() & (pl.col(\"Population\") >= t))\n",
    "            & (pl.col(\"TotalVehicleCount\").is_not_null() & (pl.col(\"TotalVehicleCount\") >= t))\n",
    "        )\n",
    "        kept_pop = kept.select(pl.col(\"Population\").sum()).item()\n",
    "        kept_veh = kept.select(pl.col(\"TotalVehicleCount\").sum()).item()\n",
    "        pct_pop_excl = (total_pop - kept_pop) / total_pop\n",
    "        pct_veh_excl = (total_veh - kept_veh) / total_veh\n",
    "        rows.append({\n",
    "            \"Threshold\": t,\n",
    "            \"ZIP Codes Kept\": kept.height,\n",
    "            \"ZIP Codes Excluded\": total_zips - kept.height,\n",
    "            \"% Population Excluded\": round(pct_pop_excl * 100, 4),\n",
    "            \"% Vehicles Excluded\": round(pct_veh_excl * 100, 4),\n",
    "        })\n",
    "    return pl.DataFrame(rows)\n",
    "\n",
    "# --- Pass 1: coarse sweep (step 100) ---\n",
    "coarse = threshold_sweep(range(0, 1100, 100))\n",
    "print(\"Coarse sweep (step 100):\")\n",
    "coarse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "obxwy7y1zr",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fine sweep (step 10) around 600–700:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (11, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Threshold</th><th>ZIP Codes Kept</th><th>ZIP Codes Excluded</th><th>% Population Excluded</th><th>% Vehicles Excluded</th></tr><tr><td>i64</td><td>i64</td><td>i64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>600</td><td>541</td><td>219</td><td>0.8424</td><td>0.6033</td></tr><tr><td>610</td><td>541</td><td>219</td><td>0.8424</td><td>0.6033</td></tr><tr><td>620</td><td>540</td><td>220</td><td>0.8545</td><td>0.6194</td></tr><tr><td>630</td><td>539</td><td>221</td><td>0.8674</td><td>0.6317</td></tr><tr><td>640</td><td>537</td><td>223</td><td>0.893</td><td>0.6644</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>660</td><td>537</td><td>223</td><td>0.893</td><td>0.6644</td></tr><tr><td>670</td><td>534</td><td>226</td><td>0.9322</td><td>0.7152</td></tr><tr><td>680</td><td>530</td><td>230</td><td>0.9851</td><td>0.7963</td></tr><tr><td>690</td><td>529</td><td>231</td><td>0.9985</td><td>0.8183</td></tr><tr><td>700</td><td>527</td><td>233</td><td>1.036</td><td>0.8503</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (11, 5)\n",
       "┌───────────┬────────────────┬────────────────────┬───────────────────────┬─────────────────────┐\n",
       "│ Threshold ┆ ZIP Codes Kept ┆ ZIP Codes Excluded ┆ % Population Excluded ┆ % Vehicles Excluded │\n",
       "│ ---       ┆ ---            ┆ ---                ┆ ---                   ┆ ---                 │\n",
       "│ i64       ┆ i64            ┆ i64                ┆ f64                   ┆ f64                 │\n",
       "╞═══════════╪════════════════╪════════════════════╪═══════════════════════╪═════════════════════╡\n",
       "│ 600       ┆ 541            ┆ 219                ┆ 0.8424                ┆ 0.6033              │\n",
       "│ 610       ┆ 541            ┆ 219                ┆ 0.8424                ┆ 0.6033              │\n",
       "│ 620       ┆ 540            ┆ 220                ┆ 0.8545                ┆ 0.6194              │\n",
       "│ 630       ┆ 539            ┆ 221                ┆ 0.8674                ┆ 0.6317              │\n",
       "│ 640       ┆ 537            ┆ 223                ┆ 0.893                 ┆ 0.6644              │\n",
       "│ …         ┆ …              ┆ …                  ┆ …                     ┆ …                   │\n",
       "│ 660       ┆ 537            ┆ 223                ┆ 0.893                 ┆ 0.6644              │\n",
       "│ 670       ┆ 534            ┆ 226                ┆ 0.9322                ┆ 0.7152              │\n",
       "│ 680       ┆ 530            ┆ 230                ┆ 0.9851                ┆ 0.7963              │\n",
       "│ 690       ┆ 529            ┆ 231                ┆ 0.9985                ┆ 0.8183              │\n",
       "│ 700       ┆ 527            ┆ 233                ┆ 1.036                 ┆ 0.8503              │\n",
       "└───────────┴────────────────┴────────────────────┴───────────────────────┴─────────────────────┘"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# --- Pass 2: fine sweep (step 10) around the coarse boundary ---\n",
    "# Find the last coarse threshold that's still within the limit\n",
    "coarse_ok = coarse.filter(\n",
    "    (pl.col(\"% Population Excluded\") <= MAX_EXCLUSION * 100)\n",
    "    & (pl.col(\"% Vehicles Excluded\") <= MAX_EXCLUSION * 100)\n",
    ")\n",
    "coarse_best = coarse_ok.select(pl.col(\"Threshold\").max()).item()\n",
    "\n",
    "fine = threshold_sweep(range(coarse_best, coarse_best + 110, 10))\n",
    "print(f\"Fine sweep (step 10) around {coarse_best}–{coarse_best + 100}:\")\n",
    "fine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "hhw4n99j17",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Selected threshold: 690\n",
      "  ZIP codes kept:       529  (231 excluded)\n",
      "  Population excluded:  0.9985 %\n",
      "  Vehicles excluded:    0.8183 %\n",
      "\n",
      "After threshold 690: 529 ZIP codes retained (231 excluded)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>Population</th><th>LandArea</th><th>PopulationDensity</th><th>BEV</th><th>TotalVehicleCount</th><th>BEVProportion</th></tr><tr><td>i64</td><td>f64</td><td>f64</td><td>f64</td><td>i64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>35004</td><td>12155.0</td><td>17.239</td><td>0.001418</td><td>55</td><td>11480</td><td>0.004791</td></tr><tr><td>35005</td><td>8247.0</td><td>31.747</td><td>0.00385</td><td>14</td><td>7500</td><td>0.001867</td></tr><tr><td>35006</td><td>3894.0</td><td>103.993</td><td>0.026706</td><td>4</td><td>3342</td><td>0.001197</td></tr><tr><td>35007</td><td>28586.0</td><td>39.048</td><td>0.001366</td><td>134</td><td>27636</td><td>0.004849</td></tr><tr><td>35010</td><td>19347.0</td><td>222.707</td><td>0.011511</td><td>44</td><td>20255</td><td>0.002172</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 7)\n",
       "┌─────────┬────────────┬──────────┬───────────────────┬─────┬───────────────────┬───────────────┐\n",
       "│ ZipCode ┆ Population ┆ LandArea ┆ PopulationDensity ┆ BEV ┆ TotalVehicleCount ┆ BEVProportion │\n",
       "│ ---     ┆ ---        ┆ ---      ┆ ---               ┆ --- ┆ ---               ┆ ---           │\n",
       "│ i64     ┆ f64        ┆ f64      ┆ f64               ┆ i64 ┆ i64               ┆ f64           │\n",
       "╞═════════╪════════════╪══════════╪═══════════════════╪═════╪═══════════════════╪═══════════════╡\n",
       "│ 35004   ┆ 12155.0    ┆ 17.239   ┆ 0.001418          ┆ 55  ┆ 11480             ┆ 0.004791      │\n",
       "│ 35005   ┆ 8247.0     ┆ 31.747   ┆ 0.00385           ┆ 14  ┆ 7500              ┆ 0.001867      │\n",
       "│ 35006   ┆ 3894.0     ┆ 103.993  ┆ 0.026706          ┆ 4   ┆ 3342              ┆ 0.001197      │\n",
       "│ 35007   ┆ 28586.0    ┆ 39.048   ┆ 0.001366          ┆ 134 ┆ 27636             ┆ 0.004849      │\n",
       "│ 35010   ┆ 19347.0    ┆ 222.707  ┆ 0.011511          ┆ 44  ┆ 20255             ┆ 0.002172      │\n",
       "└─────────┴────────────┴──────────┴───────────────────┴─────┴───────────────────┴───────────────┘"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pick the largest fine threshold where both exclusion rates are within the limit\n",
    "fine_ok = fine.filter(\n",
    "    (pl.col(\"% Population Excluded\") <= MAX_EXCLUSION * 100)\n",
    "    & (pl.col(\"% Vehicles Excluded\") <= MAX_EXCLUSION * 100)\n",
    ")\n",
    "THRESHOLD = fine_ok.select(pl.col(\"Threshold\").max()).item()\n",
    "\n",
    "chosen = fine.filter(pl.col(\"Threshold\") == THRESHOLD)\n",
    "pct_pop = chosen[\"% Population Excluded\"][0]\n",
    "pct_veh = chosen[\"% Vehicles Excluded\"][0]\n",
    "n_kept = chosen[\"ZIP Codes Kept\"][0]\n",
    "n_excl = chosen[\"ZIP Codes Excluded\"][0]\n",
    "\n",
    "print(f\"Selected threshold: {THRESHOLD}\")\n",
    "print(f\"  ZIP codes kept:       {n_kept}  ({n_excl} excluded)\")\n",
    "print(f\"  Population excluded:  {pct_pop:.4f} %\")\n",
    "print(f\"  Vehicles excluded:    {pct_veh:.4f} %\")\n",
    "\n",
    "filtered = comparisons.filter(\n",
    "    (pl.col(\"Population\").is_not_null() & (pl.col(\"Population\") >= THRESHOLD))\n",
    "    & (pl.col(\"TotalVehicleCount\").is_not_null() & (pl.col(\"TotalVehicleCount\") >= THRESHOLD))\n",
    ")\n",
    "\n",
    "print(f\"\\nAfter threshold {THRESHOLD}: {filtered.height} ZIP codes retained \"\n",
    "      f\"({comparisons.height - filtered.height} excluded)\")\n",
    "filtered.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-4",
   "metadata": {},
   "source": [
    "## 3. Add BEV per Capita and select columns\n",
    "\n",
    "- **BEV per Capita** = `BEV / Population`\n",
    "- Rename `BEV` → `BEV Count` and `BEVProportion` → `BEV Proportion`\n",
    "- Keep only `ZipCode`, `BEV Count`, `BEV Proportion`, `BEV per Capita`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "cell-5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BEV metrics shape: (529, 4)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>BEV Count</th><th>BEV Proportion</th><th>BEV per Capita</th></tr><tr><td>i64</td><td>i64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>35004</td><td>55</td><td>0.004791</td><td>0.004525</td></tr><tr><td>35005</td><td>14</td><td>0.001867</td><td>0.001698</td></tr><tr><td>35006</td><td>4</td><td>0.001197</td><td>0.001027</td></tr><tr><td>35007</td><td>134</td><td>0.004849</td><td>0.004688</td></tr><tr><td>35010</td><td>44</td><td>0.002172</td><td>0.002274</td></tr><tr><td>35014</td><td>8</td><td>0.001636</td><td>0.002485</td></tr><tr><td>35016</td><td>73</td><td>0.00383</td><td>0.004255</td></tr><tr><td>35019</td><td>3</td><td>0.00108</td><td>0.001365</td></tr><tr><td>35020</td><td>22</td><td>0.001009</td><td>0.000884</td></tr><tr><td>35022</td><td>162</td><td>0.006288</td><td>0.007005</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌─────────┬───────────┬────────────────┬────────────────┐\n",
       "│ ZipCode ┆ BEV Count ┆ BEV Proportion ┆ BEV per Capita │\n",
       "│ ---     ┆ ---       ┆ ---            ┆ ---            │\n",
       "│ i64     ┆ i64       ┆ f64            ┆ f64            │\n",
       "╞═════════╪═══════════╪════════════════╪════════════════╡\n",
       "│ 35004   ┆ 55        ┆ 0.004791       ┆ 0.004525       │\n",
       "│ 35005   ┆ 14        ┆ 0.001867       ┆ 0.001698       │\n",
       "│ 35006   ┆ 4         ┆ 0.001197       ┆ 0.001027       │\n",
       "│ 35007   ┆ 134       ┆ 0.004849       ┆ 0.004688       │\n",
       "│ 35010   ┆ 44        ┆ 0.002172       ┆ 0.002274       │\n",
       "│ 35014   ┆ 8         ┆ 0.001636       ┆ 0.002485       │\n",
       "│ 35016   ┆ 73        ┆ 0.00383        ┆ 0.004255       │\n",
       "│ 35019   ┆ 3         ┆ 0.00108        ┆ 0.001365       │\n",
       "│ 35020   ┆ 22        ┆ 0.001009       ┆ 0.000884       │\n",
       "│ 35022   ┆ 162       ┆ 0.006288       ┆ 0.007005       │\n",
       "└─────────┴───────────┴────────────────┴────────────────┘"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bev_metrics = (\n",
    "    filtered\n",
    "    .with_columns(\n",
    "        (pl.col(\"BEV\") / pl.col(\"Population\")).alias(\"BEV per Capita\"),\n",
    "    )\n",
    "    .rename({\"BEV\": \"BEV Count\", \"BEVProportion\": \"BEV Proportion\"})\n",
    "    .select(\"ZipCode\", \"BEV Count\", \"BEV Proportion\", \"BEV per Capita\")\n",
    ")\n",
    "\n",
    "print(\"BEV metrics shape:\", bev_metrics.shape)\n",
    "bev_metrics.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-6",
   "metadata": {},
   "source": [
    "## 4. Load ACS data\n",
    "\n",
    "The ACS file has one row per ZIP code and one column per ACS variable.\n",
    "Rename `Location` to `ZipCode` and cast to `Int64` to match the BEV\n",
    "metrics table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cell-7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ACS shape: (656, 910)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 910)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>B01002_E001</th><th>B01002_E002</th><th>B01002_E003</th><th>B01003_E001</th><th>B02001_E001</th><th>B02001_E002</th><th>B02001_E003</th><th>B02001_E004</th><th>B02001_E005</th><th>B02001_E006</th><th>B02001_E007</th><th>B02001_E008</th><th>B02001_E009</th><th>B02001_E010</th><th>B05001_E001</th><th>B05001_E002</th><th>B05001_E003</th><th>B05001_E004</th><th>B05001_E005</th><th>B05001_E006</th><th>B05002_E001</th><th>B05002_E002</th><th>B05002_E003</th><th>B05002_E004</th><th>B05002_E005</th><th>B05002_E006</th><th>B05002_E007</th><th>B05002_E008</th><th>B05002_E009</th><th>B05002_E010</th><th>B05002_E011</th><th>B05002_E012</th><th>B05002_E013</th><th>B05002_E014</th><th>B05002_E015</th><th>B05002_E016</th><th>&hellip;</th><th>C24050_E050</th><th>C24050_E051</th><th>C24050_E052</th><th>C24050_E053</th><th>C24050_E054</th><th>C24050_E055</th><th>C24050_E056</th><th>C24050_E057</th><th>C24050_E058</th><th>C24050_E059</th><th>C24050_E060</th><th>C24050_E061</th><th>C24050_E062</th><th>C24050_E063</th><th>C24050_E064</th><th>C24050_E065</th><th>C24050_E066</th><th>C24050_E067</th><th>C24050_E068</th><th>C24050_E069</th><th>C24050_E070</th><th>C24050_E071</th><th>C24050_E072</th><th>C24050_E073</th><th>C24050_E074</th><th>C24050_E075</th><th>C24050_E076</th><th>C24050_E077</th><th>C24050_E078</th><th>C24050_E079</th><th>C24050_E080</th><th>C24050_E081</th><th>C24050_E082</th><th>C24050_E083</th><th>C24050_E084</th><th>LandArea</th><th>PopulationDensity</th></tr><tr><td>i64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>&hellip;</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>35004</td><td>43.2</td><td>41.6</td><td>44.6</td><td>12155.0</td><td>12155.0</td><td>9472.0</td><td>1646.0</td><td>141.0</td><td>106.0</td><td>0.0</td><td>0.0</td><td>790.0</td><td>224.0</td><td>566.0</td><td>12155.0</td><td>11705.0</td><td>0.0</td><td>104.0</td><td>274.0</td><td>72.0</td><td>12155.0</td><td>11809.0</td><td>8042.0</td><td>3663.0</td><td>708.0</td><td>707.0</td><td>2000.0</td><td>248.0</td><td>104.0</td><td>0.0</td><td>0.0</td><td>104.0</td><td>346.0</td><td>274.0</td><td>33.0</td><td>124.0</td><td>&hellip;</td><td>16.0</td><td>341.0</td><td>123.0</td><td>341.0</td><td>65.0</td><td>102.0</td><td>0.0</td><td>476.0</td><td>0.0</td><td>283.0</td><td>0.0</td><td>0.0</td><td>78.0</td><td>23.0</td><td>0.0</td><td>10.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>41.0</td><td>41.0</td><td>797.0</td><td>0.0</td><td>0.0</td><td>219.0</td><td>64.0</td><td>201.0</td><td>155.0</td><td>38.0</td><td>0.0</td><td>102.0</td><td>18.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>17.239</td><td>0.001418</td></tr><tr><td>35005</td><td>41.7</td><td>36.9</td><td>46.2</td><td>8247.0</td><td>8247.0</td><td>3022.0</td><td>4559.0</td><td>0.0</td><td>86.0</td><td>538.0</td><td>0.0</td><td>42.0</td><td>18.0</td><td>24.0</td><td>8247.0</td><td>7955.0</td><td>0.0</td><td>0.0</td><td>30.0</td><td>262.0</td><td>8247.0</td><td>7955.0</td><td>6295.0</td><td>1660.0</td><td>63.0</td><td>659.0</td><td>795.0</td><td>143.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>292.0</td><td>30.0</td><td>2.0</td><td>28.0</td><td>&hellip;</td><td>53.0</td><td>382.0</td><td>32.0</td><td>203.0</td><td>2.0</td><td>4.0</td><td>34.0</td><td>240.0</td><td>0.0</td><td>81.0</td><td>2.0</td><td>5.0</td><td>3.0</td><td>39.0</td><td>0.0</td><td>41.0</td><td>19.0</td><td>5.0</td><td>21.0</td><td>24.0</td><td>0.0</td><td>840.0</td><td>0.0</td><td>18.0</td><td>196.0</td><td>38.0</td><td>274.0</td><td>158.0</td><td>0.0</td><td>0.0</td><td>33.0</td><td>39.0</td><td>3.0</td><td>65.0</td><td>16.0</td><td>31.747</td><td>0.00385</td></tr><tr><td>35006</td><td>38.5</td><td>38.6</td><td>38.5</td><td>3894.0</td><td>3894.0</td><td>3468.0</td><td>313.0</td><td>2.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>111.0</td><td>94.0</td><td>17.0</td><td>3894.0</td><td>3884.0</td><td>0.0</td><td>0.0</td><td>10.0</td><td>0.0</td><td>3894.0</td><td>3884.0</td><td>3418.0</td><td>466.0</td><td>56.0</td><td>97.0</td><td>293.0</td><td>20.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>10.0</td><td>10.0</td><td>0.0</td><td>0.0</td><td>&hellip;</td><td>0.0</td><td>39.0</td><td>80.0</td><td>98.0</td><td>0.0</td><td>12.0</td><td>190.0</td><td>354.0</td><td>46.0</td><td>87.0</td><td>40.0</td><td>0.0</td><td>95.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>55.0</td><td>0.0</td><td>0.0</td><td>14.0</td><td>17.0</td><td>301.0</td><td>0.0</td><td>24.0</td><td>144.0</td><td>0.0</td><td>28.0</td><td>19.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>42.0</td><td>0.0</td><td>44.0</td><td>0.0</td><td>103.993</td><td>0.026706</td></tr><tr><td>35007</td><td>39.2</td><td>38.6</td><td>39.5</td><td>28586.0</td><td>28586.0</td><td>19927.0</td><td>4502.0</td><td>68.0</td><td>376.0</td><td>0.0</td><td>1936.0</td><td>1777.0</td><td>891.0</td><td>886.0</td><td>28586.0</td><td>26145.0</td><td>0.0</td><td>292.0</td><td>867.0</td><td>1282.0</td><td>28586.0</td><td>26437.0</td><td>17530.0</td><td>8615.0</td><td>1076.0</td><td>1396.0</td><td>5275.0</td><td>868.0</td><td>292.0</td><td>0.0</td><td>0.0</td><td>292.0</td><td>2149.0</td><td>867.0</td><td>98.0</td><td>116.0</td><td>&hellip;</td><td>26.0</td><td>639.0</td><td>279.0</td><td>252.0</td><td>185.0</td><td>128.0</td><td>26.0</td><td>1584.0</td><td>35.0</td><td>1002.0</td><td>38.0</td><td>6.0</td><td>94.0</td><td>32.0</td><td>80.0</td><td>95.0</td><td>39.0</td><td>26.0</td><td>0.0</td><td>131.0</td><td>6.0</td><td>1614.0</td><td>0.0</td><td>99.0</td><td>350.0</td><td>94.0</td><td>422.0</td><td>309.0</td><td>0.0</td><td>0.0</td><td>48.0</td><td>123.0</td><td>33.0</td><td>121.0</td><td>15.0</td><td>39.048</td><td>0.001366</td></tr><tr><td>35010</td><td>42.5</td><td>42.3</td><td>43.2</td><td>19347.0</td><td>19347.0</td><td>11732.0</td><td>5589.0</td><td>22.0</td><td>189.0</td><td>0.0</td><td>447.0</td><td>1368.0</td><td>906.0</td><td>462.0</td><td>19347.0</td><td>18635.0</td><td>104.0</td><td>32.0</td><td>99.0</td><td>477.0</td><td>19347.0</td><td>18771.0</td><td>15851.0</td><td>2784.0</td><td>145.0</td><td>706.0</td><td>1731.0</td><td>202.0</td><td>136.0</td><td>104.0</td><td>0.0</td><td>32.0</td><td>576.0</td><td>99.0</td><td>71.0</td><td>28.0</td><td>&hellip;</td><td>10.0</td><td>226.0</td><td>40.0</td><td>219.0</td><td>67.0</td><td>64.0</td><td>62.0</td><td>845.0</td><td>27.0</td><td>461.0</td><td>60.0</td><td>33.0</td><td>0.0</td><td>36.0</td><td>9.0</td><td>0.0</td><td>0.0</td><td>59.0</td><td>77.0</td><td>83.0</td><td>0.0</td><td>1707.0</td><td>0.0</td><td>14.0</td><td>1193.0</td><td>9.0</td><td>169.0</td><td>218.0</td><td>0.0</td><td>0.0</td><td>34.0</td><td>57.0</td><td>9.0</td><td>0.0</td><td>4.0</td><td>222.707</td><td>0.011511</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 910)\n",
       "┌─────────┬─────────────┬─────────────┬─────────────┬───┬─────────────┬─────────────┬──────────┬───────────────────┐\n",
       "│ ZipCode ┆ B01002_E001 ┆ B01002_E002 ┆ B01002_E003 ┆ … ┆ C24050_E083 ┆ C24050_E084 ┆ LandArea ┆ PopulationDensity │\n",
       "│ ---     ┆ ---         ┆ ---         ┆ ---         ┆   ┆ ---         ┆ ---         ┆ ---      ┆ ---               │\n",
       "│ i64     ┆ f64         ┆ f64         ┆ f64         ┆   ┆ f64         ┆ f64         ┆ f64      ┆ f64               │\n",
       "╞═════════╪═════════════╪═════════════╪═════════════╪═══╪═════════════╪═════════════╪══════════╪═══════════════════╡\n",
       "│ 35004   ┆ 43.2        ┆ 41.6        ┆ 44.6        ┆ … ┆ 0.0         ┆ 0.0         ┆ 17.239   ┆ 0.001418          │\n",
       "│ 35005   ┆ 41.7        ┆ 36.9        ┆ 46.2        ┆ … ┆ 65.0        ┆ 16.0        ┆ 31.747   ┆ 0.00385           │\n",
       "│ 35006   ┆ 38.5        ┆ 38.6        ┆ 38.5        ┆ … ┆ 44.0        ┆ 0.0         ┆ 103.993  ┆ 0.026706          │\n",
       "│ 35007   ┆ 39.2        ┆ 38.6        ┆ 39.5        ┆ … ┆ 121.0       ┆ 15.0        ┆ 39.048   ┆ 0.001366          │\n",
       "│ 35010   ┆ 42.5        ┆ 42.3        ┆ 43.2        ┆ … ┆ 0.0         ┆ 4.0         ┆ 222.707  ┆ 0.011511          │\n",
       "└─────────┴─────────────┴─────────────┴─────────────┴───┴─────────────┴─────────────┴──────────┴───────────────────┘"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acs = (\n",
    "    pl.read_csv(\"ACS_ALZipCodes.csv\")\n",
    "    .rename({\"Location\": \"ZipCode\"})\n",
    "    .with_columns(pl.col(\"ZipCode\").cast(pl.Int64))\n",
    ")\n",
    "\n",
    "print(\"ACS shape:\", acs.shape)\n",
    "acs.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-8",
   "metadata": {},
   "source": [
    "## 5. Left join BEV metrics with ACS variables\n",
    "\n",
    "A **left join** keeps every ZIP code from the filtered BEV metrics table and\n",
    "attaches the corresponding ACS variables.  Because the BEV metrics were already\n",
    "filtered to ZIP codes with sufficient population, every row should find a match\n",
    "in the ACS data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cell-9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Combined shape: (529, 913)\n",
      "Columns: ['ZipCode', 'BEV Count', 'BEV Proportion', 'BEV per Capita', 'B01002_E001', 'B01002_E002', 'B01002_E003', 'B01003_E001', 'B02001_E001', 'B02001_E002'] ...\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 913)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>BEV Count</th><th>BEV Proportion</th><th>BEV per Capita</th><th>B01002_E001</th><th>B01002_E002</th><th>B01002_E003</th><th>B01003_E001</th><th>B02001_E001</th><th>B02001_E002</th><th>B02001_E003</th><th>B02001_E004</th><th>B02001_E005</th><th>B02001_E006</th><th>B02001_E007</th><th>B02001_E008</th><th>B02001_E009</th><th>B02001_E010</th><th>B05001_E001</th><th>B05001_E002</th><th>B05001_E003</th><th>B05001_E004</th><th>B05001_E005</th><th>B05001_E006</th><th>B05002_E001</th><th>B05002_E002</th><th>B05002_E003</th><th>B05002_E004</th><th>B05002_E005</th><th>B05002_E006</th><th>B05002_E007</th><th>B05002_E008</th><th>B05002_E009</th><th>B05002_E010</th><th>B05002_E011</th><th>B05002_E012</th><th>B05002_E013</th><th>&hellip;</th><th>C24050_E050</th><th>C24050_E051</th><th>C24050_E052</th><th>C24050_E053</th><th>C24050_E054</th><th>C24050_E055</th><th>C24050_E056</th><th>C24050_E057</th><th>C24050_E058</th><th>C24050_E059</th><th>C24050_E060</th><th>C24050_E061</th><th>C24050_E062</th><th>C24050_E063</th><th>C24050_E064</th><th>C24050_E065</th><th>C24050_E066</th><th>C24050_E067</th><th>C24050_E068</th><th>C24050_E069</th><th>C24050_E070</th><th>C24050_E071</th><th>C24050_E072</th><th>C24050_E073</th><th>C24050_E074</th><th>C24050_E075</th><th>C24050_E076</th><th>C24050_E077</th><th>C24050_E078</th><th>C24050_E079</th><th>C24050_E080</th><th>C24050_E081</th><th>C24050_E082</th><th>C24050_E083</th><th>C24050_E084</th><th>LandArea</th><th>PopulationDensity</th></tr><tr><td>i64</td><td>i64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>&hellip;</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>35004</td><td>55</td><td>0.004791</td><td>0.004525</td><td>43.2</td><td>41.6</td><td>44.6</td><td>12155.0</td><td>12155.0</td><td>9472.0</td><td>1646.0</td><td>141.0</td><td>106.0</td><td>0.0</td><td>0.0</td><td>790.0</td><td>224.0</td><td>566.0</td><td>12155.0</td><td>11705.0</td><td>0.0</td><td>104.0</td><td>274.0</td><td>72.0</td><td>12155.0</td><td>11809.0</td><td>8042.0</td><td>3663.0</td><td>708.0</td><td>707.0</td><td>2000.0</td><td>248.0</td><td>104.0</td><td>0.0</td><td>0.0</td><td>104.0</td><td>346.0</td><td>&hellip;</td><td>16.0</td><td>341.0</td><td>123.0</td><td>341.0</td><td>65.0</td><td>102.0</td><td>0.0</td><td>476.0</td><td>0.0</td><td>283.0</td><td>0.0</td><td>0.0</td><td>78.0</td><td>23.0</td><td>0.0</td><td>10.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>41.0</td><td>41.0</td><td>797.0</td><td>0.0</td><td>0.0</td><td>219.0</td><td>64.0</td><td>201.0</td><td>155.0</td><td>38.0</td><td>0.0</td><td>102.0</td><td>18.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>17.239</td><td>0.001418</td></tr><tr><td>35005</td><td>14</td><td>0.001867</td><td>0.001698</td><td>41.7</td><td>36.9</td><td>46.2</td><td>8247.0</td><td>8247.0</td><td>3022.0</td><td>4559.0</td><td>0.0</td><td>86.0</td><td>538.0</td><td>0.0</td><td>42.0</td><td>18.0</td><td>24.0</td><td>8247.0</td><td>7955.0</td><td>0.0</td><td>0.0</td><td>30.0</td><td>262.0</td><td>8247.0</td><td>7955.0</td><td>6295.0</td><td>1660.0</td><td>63.0</td><td>659.0</td><td>795.0</td><td>143.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>292.0</td><td>&hellip;</td><td>53.0</td><td>382.0</td><td>32.0</td><td>203.0</td><td>2.0</td><td>4.0</td><td>34.0</td><td>240.0</td><td>0.0</td><td>81.0</td><td>2.0</td><td>5.0</td><td>3.0</td><td>39.0</td><td>0.0</td><td>41.0</td><td>19.0</td><td>5.0</td><td>21.0</td><td>24.0</td><td>0.0</td><td>840.0</td><td>0.0</td><td>18.0</td><td>196.0</td><td>38.0</td><td>274.0</td><td>158.0</td><td>0.0</td><td>0.0</td><td>33.0</td><td>39.0</td><td>3.0</td><td>65.0</td><td>16.0</td><td>31.747</td><td>0.00385</td></tr><tr><td>35006</td><td>4</td><td>0.001197</td><td>0.001027</td><td>38.5</td><td>38.6</td><td>38.5</td><td>3894.0</td><td>3894.0</td><td>3468.0</td><td>313.0</td><td>2.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>111.0</td><td>94.0</td><td>17.0</td><td>3894.0</td><td>3884.0</td><td>0.0</td><td>0.0</td><td>10.0</td><td>0.0</td><td>3894.0</td><td>3884.0</td><td>3418.0</td><td>466.0</td><td>56.0</td><td>97.0</td><td>293.0</td><td>20.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>10.0</td><td>&hellip;</td><td>0.0</td><td>39.0</td><td>80.0</td><td>98.0</td><td>0.0</td><td>12.0</td><td>190.0</td><td>354.0</td><td>46.0</td><td>87.0</td><td>40.0</td><td>0.0</td><td>95.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>55.0</td><td>0.0</td><td>0.0</td><td>14.0</td><td>17.0</td><td>301.0</td><td>0.0</td><td>24.0</td><td>144.0</td><td>0.0</td><td>28.0</td><td>19.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>42.0</td><td>0.0</td><td>44.0</td><td>0.0</td><td>103.993</td><td>0.026706</td></tr><tr><td>35007</td><td>134</td><td>0.004849</td><td>0.004688</td><td>39.2</td><td>38.6</td><td>39.5</td><td>28586.0</td><td>28586.0</td><td>19927.0</td><td>4502.0</td><td>68.0</td><td>376.0</td><td>0.0</td><td>1936.0</td><td>1777.0</td><td>891.0</td><td>886.0</td><td>28586.0</td><td>26145.0</td><td>0.0</td><td>292.0</td><td>867.0</td><td>1282.0</td><td>28586.0</td><td>26437.0</td><td>17530.0</td><td>8615.0</td><td>1076.0</td><td>1396.0</td><td>5275.0</td><td>868.0</td><td>292.0</td><td>0.0</td><td>0.0</td><td>292.0</td><td>2149.0</td><td>&hellip;</td><td>26.0</td><td>639.0</td><td>279.0</td><td>252.0</td><td>185.0</td><td>128.0</td><td>26.0</td><td>1584.0</td><td>35.0</td><td>1002.0</td><td>38.0</td><td>6.0</td><td>94.0</td><td>32.0</td><td>80.0</td><td>95.0</td><td>39.0</td><td>26.0</td><td>0.0</td><td>131.0</td><td>6.0</td><td>1614.0</td><td>0.0</td><td>99.0</td><td>350.0</td><td>94.0</td><td>422.0</td><td>309.0</td><td>0.0</td><td>0.0</td><td>48.0</td><td>123.0</td><td>33.0</td><td>121.0</td><td>15.0</td><td>39.048</td><td>0.001366</td></tr><tr><td>35010</td><td>44</td><td>0.002172</td><td>0.002274</td><td>42.5</td><td>42.3</td><td>43.2</td><td>19347.0</td><td>19347.0</td><td>11732.0</td><td>5589.0</td><td>22.0</td><td>189.0</td><td>0.0</td><td>447.0</td><td>1368.0</td><td>906.0</td><td>462.0</td><td>19347.0</td><td>18635.0</td><td>104.0</td><td>32.0</td><td>99.0</td><td>477.0</td><td>19347.0</td><td>18771.0</td><td>15851.0</td><td>2784.0</td><td>145.0</td><td>706.0</td><td>1731.0</td><td>202.0</td><td>136.0</td><td>104.0</td><td>0.0</td><td>32.0</td><td>576.0</td><td>&hellip;</td><td>10.0</td><td>226.0</td><td>40.0</td><td>219.0</td><td>67.0</td><td>64.0</td><td>62.0</td><td>845.0</td><td>27.0</td><td>461.0</td><td>60.0</td><td>33.0</td><td>0.0</td><td>36.0</td><td>9.0</td><td>0.0</td><td>0.0</td><td>59.0</td><td>77.0</td><td>83.0</td><td>0.0</td><td>1707.0</td><td>0.0</td><td>14.0</td><td>1193.0</td><td>9.0</td><td>169.0</td><td>218.0</td><td>0.0</td><td>0.0</td><td>34.0</td><td>57.0</td><td>9.0</td><td>0.0</td><td>4.0</td><td>222.707</td><td>0.011511</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 913)\n",
       "┌─────────┬───────────┬────────────────┬────────────────┬───┬─────────────┬─────────────┬──────────┬───────────────────┐\n",
       "│ ZipCode ┆ BEV Count ┆ BEV Proportion ┆ BEV per Capita ┆ … ┆ C24050_E083 ┆ C24050_E084 ┆ LandArea ┆ PopulationDensity │\n",
       "│ ---     ┆ ---       ┆ ---            ┆ ---            ┆   ┆ ---         ┆ ---         ┆ ---      ┆ ---               │\n",
       "│ i64     ┆ i64       ┆ f64            ┆ f64            ┆   ┆ f64         ┆ f64         ┆ f64      ┆ f64               │\n",
       "╞═════════╪═══════════╪════════════════╪════════════════╪═══╪═════════════╪═════════════╪══════════╪═══════════════════╡\n",
       "│ 35004   ┆ 55        ┆ 0.004791       ┆ 0.004525       ┆ … ┆ 0.0         ┆ 0.0         ┆ 17.239   ┆ 0.001418          │\n",
       "│ 35005   ┆ 14        ┆ 0.001867       ┆ 0.001698       ┆ … ┆ 65.0        ┆ 16.0        ┆ 31.747   ┆ 0.00385           │\n",
       "│ 35006   ┆ 4         ┆ 0.001197       ┆ 0.001027       ┆ … ┆ 44.0        ┆ 0.0         ┆ 103.993  ┆ 0.026706          │\n",
       "│ 35007   ┆ 134       ┆ 0.004849       ┆ 0.004688       ┆ … ┆ 121.0       ┆ 15.0        ┆ 39.048   ┆ 0.001366          │\n",
       "│ 35010   ┆ 44        ┆ 0.002172       ┆ 0.002274       ┆ … ┆ 0.0         ┆ 4.0         ┆ 222.707  ┆ 0.011511          │\n",
       "└─────────┴───────────┴────────────────┴────────────────┴───┴─────────────┴─────────────┴──────────┴───────────────────┘"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined = bev_metrics.join(acs, on=\"ZipCode\", how=\"left\")\n",
    "\n",
    "print(\"Combined shape:\", combined.shape)\n",
    "print(\"Columns:\", combined.columns[:10], \"...\")\n",
    "combined.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3nvmljka8iu",
   "metadata": {},
   "source": [
    "## 6. Compute derived ACS variables\n",
    "\n",
    "The enhanced dictionary (`ACSEnhanced_Dictionary.csv`) defines 3,194 derived\n",
    "variables whose `Variable` column is itself a formula built from base ACS\n",
    "variable names.  There are three patterns:\n",
    "\n",
    "| Type | Example | Flags |\n",
    "|---|---|---|\n",
    "| Cumulative | `(B02001_E002 + B02001_E003)` | Cumulative=True |\n",
    "| Proportional | `B02001_E002/B02001_E001` | Proportional=True |\n",
    "| Both | `(B02001_E002 + B02001_E003)/B02001_E001` | Cumulative=True, Proportional=True |\n",
    "\n",
    "All formulas use only `+` and `/` operators.  We read the dictionary, filter to\n",
    "derived rows (Cumulative or Proportional is True), then evaluate each formula\n",
    "against the existing columns and add the result as a new column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c7qxvzz3lt9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total dictionary rows:  3254\n",
      "Derived variables:      2345\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Variable</th><th>Label</th><th>Cumulative</th><th>Proportional</th></tr><tr><td>str</td><td>str</td><td>bool</td><td>bool</td></tr></thead><tbody><tr><td>&quot;(B05011_E004)&quot;</td><td>&quot;Cumulative up to: Naturalized …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B05011_E004 + B05011_E005)&quot;</td><td>&quot;Cumulative up to: Naturalized …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B05011_E004 + B05011_E005 + B…</td><td>&quot;Cumulative up to: Naturalized …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B05011_E004 + B05011_E005 + B…</td><td>&quot;Cumulative up to: Naturalized …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B05011_E004 + B05011_E005 + B…</td><td>&quot;Cumulative up to: Naturalized …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B05011_E004 + B05011_E005 + B…</td><td>&quot;Cumulative up to: Naturalized …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B06009_E008)&quot;</td><td>&quot;Cumulative up to: Less than hi…</td><td>true</td><td>false</td></tr><tr><td>&quot;(B06009_E008 + B06009_E009)&quot;</td><td>&quot;Cumulative up to: High school …</td><td>true</td><td>false</td></tr><tr><td>&quot;(B06009_E008 + B06009_E009 + B…</td><td>&quot;Cumulative up to: Some college…</td><td>true</td><td>false</td></tr><tr><td>&quot;(B06009_E008 + B06009_E009 + B…</td><td>&quot;Cumulative up to: Bachelor&#x27;s d…</td><td>true</td><td>false</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌─────────────────────────────────┬─────────────────────────────────┬────────────┬──────────────┐\n",
       "│ Variable                        ┆ Label                           ┆ Cumulative ┆ Proportional │\n",
       "│ ---                             ┆ ---                             ┆ ---        ┆ ---          │\n",
       "│ str                             ┆ str                             ┆ bool       ┆ bool         │\n",
       "╞═════════════════════════════════╪═════════════════════════════════╪════════════╪══════════════╡\n",
       "│ (B05011_E004)                   ┆ Cumulative up to: Naturalized … ┆ true       ┆ false        │\n",
       "│ (B05011_E004 + B05011_E005)     ┆ Cumulative up to: Naturalized … ┆ true       ┆ false        │\n",
       "│ (B05011_E004 + B05011_E005 + B… ┆ Cumulative up to: Naturalized … ┆ true       ┆ false        │\n",
       "│ (B05011_E004 + B05011_E005 + B… ┆ Cumulative up to: Naturalized … ┆ true       ┆ false        │\n",
       "│ (B05011_E004 + B05011_E005 + B… ┆ Cumulative up to: Naturalized … ┆ true       ┆ false        │\n",
       "│ (B05011_E004 + B05011_E005 + B… ┆ Cumulative up to: Naturalized … ┆ true       ┆ false        │\n",
       "│ (B06009_E008)                   ┆ Cumulative up to: Less than hi… ┆ true       ┆ false        │\n",
       "│ (B06009_E008 + B06009_E009)     ┆ Cumulative up to: High school … ┆ true       ┆ false        │\n",
       "│ (B06009_E008 + B06009_E009 + B… ┆ Cumulative up to: Some college… ┆ true       ┆ false        │\n",
       "│ (B06009_E008 + B06009_E009 + B… ┆ Cumulative up to: Bachelor's d… ┆ true       ┆ false        │\n",
       "└─────────────────────────────────┴─────────────────────────────────┴────────────┴──────────────┘"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dictionary = pl.read_csv(\"ACSEnhanced_Dictionary.csv\")\n",
    "\n",
    "derived = dictionary.filter(\n",
    "    (pl.col(\"Cumulative\") == True) | (pl.col(\"Proportional\") == True)\n",
    ")\n",
    "\n",
    "print(f\"Total dictionary rows:  {dictionary.height}\")\n",
    "print(f\"Derived variables:      {derived.height}\")\n",
    "derived.select(\"Variable\", \"Label\", \"Cumulative\", \"Proportional\").head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "tx2pbm7nj7q",
   "metadata": {},
   "source": [
    "### Parse and evaluate each formula\n",
    "\n",
    "Each formula's `Variable` name is parsed into a Polars expression:\n",
    "\n",
    "- Variable references like `B02001_E002` become `pl.col(\"B02001_E002\")`\n",
    "- `+` maps to Polars addition\n",
    "- `/` maps to Polars division (with `.fill_nan(0.0)` to handle 0/0 cases)\n",
    "- Parentheses are handled by splitting on `/` first (lower precedence), then\n",
    "  summing the `+`-separated terms within each part.\n",
    "\n",
    "Formulas that reference columns not present in the data (e.g. variables excluded\n",
    "during download) are skipped with a warning count."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "crmxwe215bn",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Derived columns to compute: 2345\n",
      "Skipped (missing references): 0\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "available_cols = set(combined.columns)\n",
    "\n",
    "def parse_sum(term: str) -> pl.Expr:\n",
    "    \"\"\"Parse a '+'-separated sum of variable names into a Polars expression.\"\"\"\n",
    "    parts = [t.strip() for t in term.split(\"+\")]\n",
    "    expr = pl.col(parts[0])\n",
    "    for p in parts[1:]:\n",
    "        expr = expr + pl.col(p)\n",
    "    return expr\n",
    "\n",
    "def formula_to_expr(formula: str) -> pl.Expr:\n",
    "    \"\"\"Convert a derived-variable formula string into a Polars expression.\"\"\"\n",
    "    # Strip outer whitespace\n",
    "    formula = formula.strip()\n",
    "    # Remove outer parentheses that wrap the entire formula (cumulative-only)\n",
    "    clean = re.sub(r\"[()]\", \"\", formula)\n",
    "\n",
    "    if \"/\" in clean:\n",
    "        numerator, denominator = clean.split(\"/\", 1)\n",
    "        return (parse_sum(numerator) / parse_sum(denominator)).fill_nan(0.0)\n",
    "    else:\n",
    "        return parse_sum(clean)\n",
    "\n",
    "def formula_vars(formula: str) -> list[str]:\n",
    "    \"\"\"Extract all variable references from a formula string.\"\"\"\n",
    "    return re.findall(r\"[A-Z][A-Za-z0-9_]+_E\\d+|LandArea|PopulationDensity\", formula)\n",
    "\n",
    "# Build all derived columns\n",
    "new_exprs = []\n",
    "skipped = 0\n",
    "\n",
    "for row in derived.iter_rows(named=True):\n",
    "    var_name = row[\"Variable\"]\n",
    "    refs = formula_vars(var_name)\n",
    "    if not refs or not all(r in available_cols for r in refs):\n",
    "        skipped += 1\n",
    "        continue\n",
    "    new_exprs.append(formula_to_expr(var_name).alias(var_name))\n",
    "\n",
    "print(f\"Derived columns to compute: {len(new_exprs)}\")\n",
    "print(f\"Skipped (missing references): {skipped}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3evvijbme2j",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Combined shape after derived variables: (529, 3258)\n",
      "  Rows: 529\n",
      "  Columns: 3258 (4 BEV + 909 base ACS + 2345 derived)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>ZipCode</th><th>BEV Count</th><th>BEV Proportion</th><th>BEV per Capita</th><th>(B25133_E004 + B25133_E005 + B25133_E006)/B25133_E001</th><th>(B25133_E004 + B25133_E005 + B25133_E006 + B25133_E007)/B25133_E003</th><th>(B25133_E004 + B25133_E005 + B25133_E006 + B25133_E007)/B25133_E001</th><th>(B25133_E004 + B25133_E005 + B25133_E006 + B25133_E007 + B25133_E008)/B25133_E003</th><th>(B25133_E004 + B25133_E005 + B25133_E006 + B25133_E007 + B25133_E008)/B25133_E001</th></tr><tr><td>i64</td><td>i64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>35004</td><td>55</td><td>0.004791</td><td>0.004525</td><td>0.370097</td><td>0.753077</td><td>0.402136</td><td>0.888077</td><td>0.474225</td></tr><tr><td>35005</td><td>14</td><td>0.001867</td><td>0.001698</td><td>0.149614</td><td>0.389179</td><td>0.200268</td><td>0.531291</td><td>0.273398</td></tr><tr><td>35006</td><td>4</td><td>0.001197</td><td>0.001027</td><td>0.159798</td><td>0.738636</td><td>0.187997</td><td>0.818182</td><td>0.208243</td></tr><tr><td>35007</td><td>134</td><td>0.004849</td><td>0.004688</td><td>0.280972</td><td>0.84428</td><td>0.33818</td><td>0.926917</td><td>0.371281</td></tr><tr><td>35010</td><td>44</td><td>0.002172</td><td>0.002274</td><td>0.172777</td><td>0.712201</td><td>0.21052</td><td>0.787596</td><td>0.232806</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 9)\n",
       "┌─────────┬───────────┬────────────┬──────────┬───┬─────────────────┬────────────────┬────────────────┬────────────────┐\n",
       "│ ZipCode ┆ BEV Count ┆ BEV        ┆ BEV per  ┆ … ┆ (B25133_E004 +  ┆ (B25133_E004 + ┆ (B25133_E004 + ┆ (B25133_E004 + │\n",
       "│ ---     ┆ ---       ┆ Proportion ┆ Capita   ┆   ┆ B25133_E005 +   ┆ B25133_E005 +  ┆ B25133_E005 +  ┆ B25133_E005 +  │\n",
       "│ i64     ┆ i64       ┆ ---        ┆ ---      ┆   ┆ B…              ┆ B…             ┆ B…             ┆ B…             │\n",
       "│         ┆           ┆ f64        ┆ f64      ┆   ┆ ---             ┆ ---            ┆ ---            ┆ ---            │\n",
       "│         ┆           ┆            ┆          ┆   ┆ f64             ┆ f64            ┆ f64            ┆ f64            │\n",
       "╞═════════╪═══════════╪════════════╪══════════╪═══╪═════════════════╪════════════════╪════════════════╪════════════════╡\n",
       "│ 35004   ┆ 55        ┆ 0.004791   ┆ 0.004525 ┆ … ┆ 0.753077        ┆ 0.402136       ┆ 0.888077       ┆ 0.474225       │\n",
       "│ 35005   ┆ 14        ┆ 0.001867   ┆ 0.001698 ┆ … ┆ 0.389179        ┆ 0.200268       ┆ 0.531291       ┆ 0.273398       │\n",
       "│ 35006   ┆ 4         ┆ 0.001197   ┆ 0.001027 ┆ … ┆ 0.738636        ┆ 0.187997       ┆ 0.818182       ┆ 0.208243       │\n",
       "│ 35007   ┆ 134       ┆ 0.004849   ┆ 0.004688 ┆ … ┆ 0.84428         ┆ 0.33818        ┆ 0.926917       ┆ 0.371281       │\n",
       "│ 35010   ┆ 44        ┆ 0.002172   ┆ 0.002274 ┆ … ┆ 0.712201        ┆ 0.21052        ┆ 0.787596       ┆ 0.232806       │\n",
       "└─────────┴───────────┴────────────┴──────────┴───┴─────────────────┴────────────────┴────────────────┴────────────────┘"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined = combined.with_columns(new_exprs)\n",
    "\n",
    "print(\"Combined shape after derived variables:\", combined.shape)\n",
    "print(f\"  Rows: {combined.height}\")\n",
    "print(f\"  Columns: {combined.width} (4 BEV + 909 base ACS + {combined.width - 4 - 909} derived)\")\n",
    "combined.select(combined.columns[:4] + combined.columns[-5:]).head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cell-10",
   "metadata": {},
   "source": [
    "## 7. Save to CombinedInput.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cell-11",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saved CombinedInput.csv\n",
      "  Rows (ZIP codes): 529\n",
      "  Columns: 3258 (4 BEV metrics + 3254 ACS variables)\n"
     ]
    }
   ],
   "source": [
    "combined.write_csv(\"CombinedInput.csv\")\n",
    "print(\"Saved CombinedInput.csv\")\n",
    "print(f\"  Rows (ZIP codes): {combined.height}\")\n",
    "print(f\"  Columns: {combined.width} (4 BEV metrics + {combined.width - 4} ACS variables)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "rh1k578i88b",
   "metadata": {},
   "source": [
    "## 8. Forward Groupwise Stepwise Regression\n",
    "\n",
    "The algorithm below implements **forward groupwise stepwise selection** using\n",
    "the adjusted $R^2$ criterion.  It is parameterised along two dimensions:\n",
    "\n",
    "### Dimension 1 — Grouping level\n",
    "\n",
    "| Value | Dictionary column | Description |\n",
    "|---|---|---|\n",
    "| `\"macro\"` | `Group` | 55 ACS table groups (e.g. `B02001`) |\n",
    "| `\"micro\"` | `SubGroupID` | 865 sub-groups (e.g. `B02001_2`, `B02001_2_C_P1`) |\n",
    "\n",
    "### Dimension 2 — Response mode\n",
    "\n",
    "| Value | Description |\n",
    "|---|---|\n",
    "| `\"individual\"` | Run separately for each of the 3 DVs; selection guided by that DV's Adj. $R^2$ |\n",
    "| `\"aggregate\"` | Run once, selecting the group that maximises the **mean** Adj. $R^2$ across all 3 DVs simultaneously |\n",
    "\n",
    "### Combinations\n",
    "\n",
    "All four combinations are executed.  Each produces its own CSV:\n",
    "\n",
    "| File | Grouping | Response |\n",
    "|---|---|---|\n",
    "| `Results_macro_individual.csv` | macro | individual |\n",
    "| `Results_macro_aggregate.csv` | macro | aggregate |\n",
    "| `Results_micro_individual.csv` | micro | individual |\n",
    "| `Results_micro_aggregate.csv` | micro | aggregate |\n",
    "\n",
    "A combination is **skipped** if its output file already exists in the folder."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "k6piztg89pi",
   "metadata": {},
   "source": [
    "### Configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "b8krs24159s",
   "metadata": {},
   "outputs": [],
   "source": [
    "RESPONSES = [\"BEV Count\", \"BEV Proportion\", \"BEV per Capita\"]\n",
    "SENTINEL = -666666666.0         # ACS missing-data sentinel\n",
    "MAX_ITER = None                 # defaults to |G| (number of groups)\n",
    "GROUPING_LEVELS = [\"macro\", \"micro\"]\n",
    "RESPONSE_MODES = [\"individual\", \"aggregate\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5utb3nii25l",
   "metadata": {},
   "source": [
    "### Build the group → columns mappings (macro and micro)\n",
    "\n",
    "Each row in the enhanced dictionary maps a variable name to both a macro group\n",
    "(`Group`, e.g. `B02001`) and a micro sub-group (`SubGroupID`, e.g.\n",
    "`B02001_2_C_P1`).  We build both mappings so the algorithm can operate at\n",
    "either level.  Sentinel values (`-666666666`) are replaced with `0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "tm1diihejw",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Observations (n): 529\n",
      "Predictor columns: 3254\n",
      "Macro groups: 55\n",
      "Micro sub-groups: 702\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "n = combined.height\n",
    "\n",
    "# --- Predictor columns (everything except BEV metrics and ZipCode) ---\n",
    "bev_cols = {\"ZipCode\", \"BEV Count\", \"BEV Proportion\", \"BEV per Capita\"}\n",
    "predictor_cols = [c for c in combined.columns if c not in bev_cols]\n",
    "col_to_idx = {col: i for i, col in enumerate(predictor_cols)}\n",
    "\n",
    "# Build full X matrix (n × total predictors)\n",
    "X_full = combined.select(predictor_cols).to_numpy().astype(np.float64)\n",
    "\n",
    "# --- Count missing values before replacement ---\n",
    "n_sentinel = int(np.sum(X_full == SENTINEL))\n",
    "n_nan = int(np.sum(np.isnan(X_full)))\n",
    "n_inf = int(np.sum(np.isinf(X_full)))\n",
    "total_cells = X_full.shape[0] * X_full.shape[1]\n",
    "\n",
    "# Per-group sentinel counts\n",
    "group_sentinel_rows = []\n",
    "for row in dictionary.iter_rows(named=True):\n",
    "    var_name = row[\"Variable\"]\n",
    "    group_id = row[\"Group\"]\n",
    "    if var_name in col_to_idx:\n",
    "        idx = col_to_idx[var_name]\n",
    "        col_sentinels = int(np.sum(X_full[:, idx] == SENTINEL))\n",
    "        col_nans = int(np.sum(np.isnan(X_full[:, idx])))\n",
    "        if col_sentinels > 0 or col_nans > 0:\n",
    "            group_sentinel_rows.append({\n",
    "                \"Group\": group_id,\n",
    "                \"Variable\": var_name,\n",
    "                \"Sentinels\": col_sentinels,\n",
    "                \"NaNs\": col_nans,\n",
    "            })\n",
    "\n",
    "# Replace sentinel and NaN values with 0\n",
    "X_full = np.where(X_full == SENTINEL, 0.0, X_full)\n",
    "X_full = np.nan_to_num(X_full, nan=0.0, posinf=0.0, neginf=0.0)\n",
    "\n",
    "# --- Build BOTH group → column-index mappings ---\n",
    "macro_to_idx: dict[str, list[int]] = {}   # Group column\n",
    "micro_to_idx: dict[str, list[int]] = {}   # SubGroupID column\n",
    "\n",
    "for row in dictionary.iter_rows(named=True):\n",
    "    var_name = row[\"Variable\"]\n",
    "    if var_name not in col_to_idx:\n",
    "        continue\n",
    "    idx = col_to_idx[var_name]\n",
    "    macro_to_idx.setdefault(row[\"Group\"], []).append(idx)\n",
    "    micro_to_idx.setdefault(row[\"SubGroupID\"], []).append(idx)\n",
    "\n",
    "# Remove empty entries\n",
    "macro_to_idx = {g: v for g, v in macro_to_idx.items() if v}\n",
    "micro_to_idx = {g: v for g, v in micro_to_idx.items() if v}\n",
    "\n",
    "# Lookup dict for the algorithm\n",
    "grouping_maps = {\n",
    "    \"macro\": macro_to_idx,\n",
    "    \"micro\": micro_to_idx,\n",
    "}\n",
    "\n",
    "# --- Response vectors ---\n",
    "y_vectors = {\n",
    "    resp: combined.get_column(resp).cast(pl.Float64).to_numpy()\n",
    "    for resp in RESPONSES\n",
    "}\n",
    "\n",
    "print(f\"Observations (n): {n}\")\n",
    "print(f\"Predictor columns: {len(predictor_cols)}\")\n",
    "print(f\"Macro groups: {len(macro_to_idx)}\")\n",
    "print(f\"Micro sub-groups: {len(micro_to_idx)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dzcrlah8n4t",
   "metadata": {},
   "source": [
    "### Missing value summary\n",
    "\n",
    "The ACS data uses the sentinel value `-666666666` for suppressed or unavailable\n",
    "estimates.  Derived proportional columns may also produce `NaN` (0/0) or `Inf`\n",
    "(x/0).  All such values are replaced with `0` before regression.  The summary\n",
    "below reports how many were found."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "li8k8zcsckp",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Design matrix: 529 rows × 3254 columns = 1,721,366 cells\n",
      "  Sentinel (-666666666):     4,544  (0.2640%)\n",
      "  NaN:                            261  (0.0152%)\n",
      "  Inf:                              0  (0.0000%)\n",
      "  Total replaced with 0:        4,805  (0.2791%)\n",
      "\n",
      "Columns with at least one sentinel: 44\n",
      "Columns with at least one NaN:      9\n"
     ]
    }
   ],
   "source": [
    "print(f\"Design matrix: {X_full.shape[0]} rows × {X_full.shape[1]} columns = {total_cells:,} cells\")\n",
    "print(f\"  Sentinel ({SENTINEL:.0f}):  {n_sentinel:>8,}  ({n_sentinel / total_cells:.4%})\")\n",
    "print(f\"  NaN:                       {n_nan:>8,}  ({n_nan / total_cells:.4%})\")\n",
    "print(f\"  Inf:                       {n_inf:>8,}  ({n_inf / total_cells:.4%})\")\n",
    "print(f\"  Total replaced with 0:     {n_sentinel + n_nan + n_inf:>8,}  ({(n_sentinel + n_nan + n_inf) / total_cells:.4%})\")\n",
    "\n",
    "# Columns affected\n",
    "cols_with_sentinel = sum(1 for r in group_sentinel_rows if r[\"Sentinels\"] > 0)\n",
    "cols_with_nan = sum(1 for r in group_sentinel_rows if r[\"NaNs\"] > 0)\n",
    "print(f\"\\nColumns with at least one sentinel: {cols_with_sentinel}\")\n",
    "print(f\"Columns with at least one NaN:      {cols_with_nan}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ohjy3682j8",
   "metadata": {},
   "source": [
    "### Missing values by group\n",
    "\n",
    "Aggregated sentinel + NaN counts per ACS group, showing only groups that have\n",
    "at least one affected cell.  This helps identify which groups may be less\n",
    "reliable due to suppressed data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "wxkjdr9uuw",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Groups with missing values: 12 of 55\n"
     ]
    }
   ],
   "source": [
    "if group_sentinel_rows:\n",
    "    missing_detail = pl.DataFrame(group_sentinel_rows)\n",
    "\n",
    "    by_group = (\n",
    "        missing_detail\n",
    "        .group_by(\"Group\")\n",
    "        .agg(\n",
    "            pl.col(\"Variable\").count().alias(\"Columns Affected\"),\n",
    "            pl.col(\"Sentinels\").sum().alias(\"Total Sentinels\"),\n",
    "            pl.col(\"NaNs\").sum().alias(\"Total NaNs\"),\n",
    "        )\n",
    "        .with_columns(\n",
    "            (pl.col(\"Total Sentinels\") + pl.col(\"Total NaNs\")).alias(\"Total Missing\"),\n",
    "        )\n",
    "        .sort(\"Total Missing\", descending=True)\n",
    "    )\n",
    "\n",
    "    print(f\"Groups with missing values: {by_group.height} of {len(macro_to_idx)}\")\n",
    "    by_group\n",
    "else:\n",
    "    print(\"No missing values found in the design matrix.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pwrhggxl39",
   "metadata": {},
   "source": [
    "### Adjusted $R^2$ helper\n",
    "\n",
    "Given a design matrix $X$ (without intercept) and response $y$, the function\n",
    "adds an intercept column, fits OLS via `numpy.linalg.lstsq`, and computes:\n",
    "\n",
    "$$\\text{Adj.}\\;R^2 = 1 - \\frac{SS_{\\text{res}} / (n - k)}{SS_{\\text{tot}} / (n - 1)}$$\n",
    "\n",
    "where $k$ = number of regressors including the intercept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "2zipxwwypg2",
   "metadata": {},
   "outputs": [],
   "source": [
    "def adjusted_r2(X: np.ndarray, y: np.ndarray) -> float:\n",
    "    \"\"\"Compute adjusted R² for OLS regression of y on X (intercept added automatically).\"\"\"\n",
    "    n = X.shape[0]\n",
    "    X_int = np.column_stack([np.ones(n), X])   # prepend intercept\n",
    "    k = X_int.shape[1]                         # total regressors incl. intercept\n",
    "    if k >= n:\n",
    "        return -np.inf                         # not enough degrees of freedom\n",
    "\n",
    "    beta, _, _, _ = np.linalg.lstsq(X_int, y, rcond=None)\n",
    "    ss_res = np.sum((y - X_int @ beta) ** 2)\n",
    "    ss_tot = np.sum((y - y.mean()) ** 2)\n",
    "    if ss_tot == 0:\n",
    "        return -np.inf\n",
    "    r2 = 1.0 - ss_res / ss_tot\n",
    "    return 1.0 - (1.0 - r2) * (n - 1) / (n - k)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "la5zaofon69",
   "metadata": {},
   "source": [
    "### Forward selection functions\n",
    "\n",
    "`forward_selection_individual` runs the algorithm for a single response\n",
    "variable.  `forward_selection_aggregate` runs it once, selecting the group\n",
    "that maximises the **mean** Adj. $R^2$ across all response variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "sh36xdogp9m",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Functions defined.\n"
     ]
    }
   ],
   "source": [
    "from collections.abc import Iterator\n",
    "\n",
    "def forward_selection_individual(\n",
    "    X: np.ndarray,\n",
    "    y: np.ndarray,\n",
    "    grp_to_idx: dict[str, list[int]],\n",
    "    max_iter: int | None = None,\n",
    ") -> Iterator[dict]:\n",
    "    \"\"\"Forward groupwise selection for a single response variable (generator).\"\"\"\n",
    "    G = set(grp_to_idx.keys())\n",
    "    S: set[str] = set()\n",
    "    R = 0.0\n",
    "    improved = True\n",
    "    i = 0\n",
    "    _max = max_iter or len(G)\n",
    "\n",
    "    while improved and i < _max:\n",
    "        candidates = G - S\n",
    "        if not candidates:\n",
    "            break\n",
    "\n",
    "        current_idx = []\n",
    "        for g in S:\n",
    "            current_idx.extend(grp_to_idx[g])\n",
    "\n",
    "        best_g = None\n",
    "        best_adj = -np.inf\n",
    "        for g in candidates:\n",
    "            trial_idx = current_idx + grp_to_idx[g]\n",
    "            adj = adjusted_r2(X[:, trial_idx], y)\n",
    "            if adj > best_adj:\n",
    "                best_adj = adj\n",
    "                best_g = g\n",
    "\n",
    "        if best_adj > R:\n",
    "            S.add(best_g)\n",
    "            R = best_adj\n",
    "            yield {\n",
    "                \"Step\": i + 1,\n",
    "                \"Group Added\": best_g,\n",
    "                \"Adj R²\": round(R, 6),\n",
    "                \"Cumul. Groups\": len(S),\n",
    "                \"Cumul. Predictors\": sum(len(grp_to_idx[g]) for g in S),\n",
    "            }\n",
    "        else:\n",
    "            improved = False\n",
    "        i += 1\n",
    "\n",
    "\n",
    "def forward_selection_aggregate(\n",
    "    X: np.ndarray,\n",
    "    ys: dict[str, np.ndarray],\n",
    "    grp_to_idx: dict[str, list[int]],\n",
    "    max_iter: int | None = None,\n",
    ") -> Iterator[dict]:\n",
    "    \"\"\"Forward groupwise selection guided by mean Adj R² across all responses (generator).\"\"\"\n",
    "    G = set(grp_to_idx.keys())\n",
    "    S: set[str] = set()\n",
    "    R = 0.0\n",
    "    improved = True\n",
    "    i = 0\n",
    "    _max = max_iter or len(G)\n",
    "    resp_names = list(ys.keys())\n",
    "\n",
    "    while improved and i < _max:\n",
    "        candidates = G - S\n",
    "        if not candidates:\n",
    "            break\n",
    "\n",
    "        current_idx = []\n",
    "        for g in S:\n",
    "            current_idx.extend(grp_to_idx[g])\n",
    "\n",
    "        best_g = None\n",
    "        best_mean_adj = -np.inf\n",
    "        best_per_resp: dict[str, float] = {}\n",
    "        for g in candidates:\n",
    "            trial_idx = current_idx + grp_to_idx[g]\n",
    "            X_trial = X[:, trial_idx]\n",
    "            adjs = {r: adjusted_r2(X_trial, y) for r, y in ys.items()}\n",
    "            mean_adj = np.mean(list(adjs.values()))\n",
    "            if mean_adj > best_mean_adj:\n",
    "                best_mean_adj = mean_adj\n",
    "                best_g = g\n",
    "                best_per_resp = adjs\n",
    "\n",
    "        if best_mean_adj > R:\n",
    "            S.add(best_g)\n",
    "            R = best_mean_adj\n",
    "            row = {\n",
    "                \"Step\": i + 1,\n",
    "                \"Group Added\": best_g,\n",
    "                \"Mean Adj R²\": round(R, 6),\n",
    "                \"Cumul. Groups\": len(S),\n",
    "                \"Cumul. Predictors\": sum(len(grp_to_idx[g]) for g in S),\n",
    "            }\n",
    "            for r in resp_names:\n",
    "                row[f\"Adj R² ({r})\"] = round(best_per_resp[r], 6)\n",
    "            yield row\n",
    "        else:\n",
    "            improved = False\n",
    "        i += 1\n",
    "\n",
    "print(\"Functions defined.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "k8z8op8cwo",
   "metadata": {},
   "source": [
    "### Run all parameter combinations\n",
    "\n",
    "Each combination of (`grouping_level`, `response_mode`) produces one CSV.\n",
    "If the file already exists it is skipped."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "hkfvaqw6mn9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SKIP  Results_macro_individual.csv  (already exists)\n",
      "SKIP  Results_macro_aggregate.csv  (already exists)\n",
      "\n",
      "=================================================================\n",
      "  Grouping: micro (702 groups)  |  Mode: individual\n",
      "=================================================================\n",
      "  [BEV Count         ] Step   1: +B15002_3              Adj R² = 0.944761  (16 vars)\n",
      "  [BEV Count         ] Step   2: +B25104_2              Adj R² = 0.960681  (32 vars)\n",
      "  [BEV Count         ] Step   3: +B06009_4_C            Adj R² = 0.967735  (36 vars)\n",
      "  [BEV Count         ] Step   4: +B25040_2              Adj R² = 0.971215  (45 vars)\n",
      "  [BEV Count         ] Step   5: +C24050_2              Adj R² = 0.973886  (63 vars)\n",
      "  [BEV Count         ] Step   6: +B25132_3_C            Adj R² = 0.976353  (68 vars)\n",
      "  [BEV Count         ] Step   7: +B15002_4_C            Adj R² = 0.977978  (83 vars)\n",
      "  [BEV Count         ] Step   8: +C24050_4              Adj R² = 0.979523  (96 vars)\n",
      "  [BEV Count         ] Step   9: +B15011_8              Adj R² = 0.981060  (101 vars)\n",
      "  [BEV Count         ] Step  10: +B08202_5_C            Adj R² = 0.982488  (104 vars)\n",
      "  [BEV Count         ] Step  11: +B08203_2_C            Adj R² = 0.983499  (112 vars)\n",
      "  [BEV Count         ] Step  12: +B08012_3_C            Adj R² = 0.984656  (123 vars)\n",
      "  [BEV Count         ] Step  13: +B08122_2              Adj R² = 0.985362  (132 vars)\n",
      "  [BEV Count         ] Step  14: +B08016_6_P1           Adj R² = 0.986149  (134 vars)\n",
      "  [BEV Count         ] Step  15: +B19101_2_C            Adj R² = 0.986785  (149 vars)\n",
      "  [BEV Count         ] Step  16: +C24050_7              Adj R² = 0.987739  (162 vars)\n",
      "  [BEV Count         ] Step  17: +B15011_9              Adj R² = 0.988417  (167 vars)\n",
      "  [BEV Count         ] Step  18: +B25133_3_C            Adj R² = 0.988959  (172 vars)\n",
      "  [BEV Count         ] Step  19: +B05002_7              Adj R² = 0.989491  (178 vars)\n",
      "  [BEV Count         ] Step  20: +B23027_10             Adj R² = 0.989965  (180 vars)\n",
      "  [BEV Count         ] Step  21: +B02001_2              Adj R² = 0.990315  (187 vars)\n",
      "  [BEV Count         ] Step  22: +B08303_2_C            Adj R² = 0.990828  (198 vars)\n",
      "  [BEV Count         ] Step  23: +B08201_5_P2           Adj R² = 0.991177  (203 vars)\n",
      "  [BEV Count         ] Step  24: +B08124_3              Adj R² = 0.991471  (209 vars)\n",
      "  [BEV Count         ] Step  25: +C17002_2              Adj R² = 0.991786  (216 vars)\n",
      "  [BEV Count         ] Step  26: +C24050_5              Adj R² = 0.992074  (229 vars)\n",
      "  [BEV Count         ] Step  27: +B08006_10             Adj R² = 0.992455  (235 vars)\n",
      "  [BEV Count         ] Step  28: +B06009_5              Adj R² = 0.992912  (240 vars)\n",
      "  [BEV Count         ] Step  29: +B08122_5_C            Adj R² = 0.993219  (242 vars)\n",
      "  [BEV Count         ] Step  30: +B08017_6_P2           Adj R² = 0.993431  (244 vars)\n",
      "  [BEV Count         ] Step  31: +B19126_5              Adj R² = 0.993687  (246 vars)\n",
      "  [BEV Count         ] Step  32: +B08122_8_P1           Adj R² = 0.993928  (249 vars)\n",
      "  [BEV Count         ] Step  33: +B19019_2              Adj R² = 0.994089  (256 vars)\n",
      "  [BEV Count         ] Step  34: +B15002_4_C_P2         Adj R² = 0.994245  (271 vars)\n",
      "  [BEV Count         ] Step  35: +B25132_3_C_P2         Adj R² = 0.994430  (276 vars)\n",
      "  [BEV Count         ] Step  36: +B15011_7_C_P2         Adj R² = 0.994578  (278 vars)\n",
      "  [BEV Count         ] Step  37: +B05002_5_P1           Adj R² = 0.994767  (281 vars)\n",
      "  [BEV Count         ] Step  38: +B08201_5_C            Adj R² = 0.994982  (285 vars)\n",
      "  [BEV Count         ] Step  39: +B08301_2_P1           Adj R² = 0.995147  (293 vars)\n",
      "  [BEV Count         ] Step  40: +B08202_4_C            Adj R² = 0.995247  (295 vars)\n",
      "  [BEV Count         ] Step  41: +B15011_6              Adj R² = 0.995371  (300 vars)\n",
      "  [BEV Count         ] Step  42: +B17020_3_C            Adj R² = 0.995502  (306 vars)\n",
      "  [BEV Count         ] Step  43: +B06009_3_C_P1         Adj R² = 0.995615  (310 vars)\n",
      "  [BEV Count         ] Step  44: +B06012_5_C_P2         Adj R² = 0.995717  (312 vars)\n",
      "  [BEV Count         ] Step  45: +B19113_1              Adj R² = 0.995814  (313 vars)\n",
      "  [BEV Count         ] Step  46: +B06010_3_C            Adj R² = 0.995926  (320 vars)\n",
      "  [BEV Count         ] Step  47: +B08016_9_P1           Adj R² = 0.996027  (322 vars)\n",
      "  [BEV Count         ] Step  48: +B23020_2              Adj R² = 0.996132  (324 vars)\n",
      "  [BEV Count         ] Step  49: +C24050_3_P2           Adj R² = 0.996287  (337 vars)\n",
      "  [BEV Count         ] Step  50: +B08017_6_P3           Adj R² = 0.996412  (339 vars)\n",
      "  [BEV Count         ] Step  51: +B23027_11             Adj R² = 0.996548  (341 vars)\n",
      "  [BEV Count         ] Step  52: +B08124_5              Adj R² = 0.996675  (347 vars)\n",
      "  [BEV Count         ] Step  53: +B06012_6_C            Adj R² = 0.996810  (349 vars)\n",
      "  [BEV Count         ] Step  54: +C24050_7_P2           Adj R² = 0.996903  (362 vars)\n",
      "  [BEV Count         ] Step  55: +B08122_6_C_P1         Adj R² = 0.997032  (364 vars)\n",
      "  [BEV Count         ] Step  56: +B08006_8_P4           Adj R² = 0.997171  (367 vars)\n",
      "  [BEV Count         ] Step  57: +B08203_3_C            Adj R² = 0.997272  (371 vars)\n",
      "  [BEV Count         ] Step  58: +B08016_4_P2           Adj R² = 0.997360  (373 vars)\n",
      "  [BEV Count         ] Step  59: +B08124_7_P2           Adj R² = 0.997454  (379 vars)\n",
      "  [BEV Count         ] Step  60: +B06010_7_C_P1         Adj R² = 0.997545  (386 vars)\n",
      "  [BEV Count         ] Step  61: +C24050_7_P1           Adj R² = 0.997688  (399 vars)\n",
      "  [BEV Count         ] Step  62: +B06012_4_C            Adj R² = 0.997806  (401 vars)\n",
      "  [BEV Count         ] Step  63: +B08124_4              Adj R² = 0.997918  (407 vars)\n",
      "  [BEV Count         ] Step  64: +B25133_3_P2           Adj R² = 0.998115  (413 vars)\n",
      "  [BEV Count         ] Step  65: +B08202_6_C_P1         Adj R² = 0.998226  (416 vars)\n",
      "  [BEV Count         ] Step  66: +B15011_4_P1           Adj R² = 0.998345  (421 vars)\n",
      "  [BEV Count         ] Step  67: +B23027_2_C            Adj R² = 0.998477  (427 vars)\n",
      "  [BEV Count         ] Step  68: +B08014_3_P2           Adj R² = 0.998596  (433 vars)\n",
      "  [BEV Count         ] Step  69: +B08016_3_P2           Adj R² = 0.998688  (437 vars)\n",
      "  [BEV Count         ] Step  70: +B08016_6_P2           Adj R² = 0.998947  (439 vars)\n",
      "  [BEV Count         ] Step  71: +B15011_5              Adj R² = 0.998999  (444 vars)\n",
      "  [BEV Count         ] Step  72: +B17020_4_C            Adj R² = 0.999100  (450 vars)\n",
      "  [BEV Count         ] Step  73: +C24050_5_P1           Adj R² = 0.999185  (463 vars)\n",
      "  [BEV Count         ] Step  74: +B19081_2              Adj R² = 0.999262  (467 vars)\n",
      "  [BEV Count         ] Step  75: +B08202_5_P1           Adj R² = 0.999361  (471 vars)\n",
      "  [BEV Count         ] Step  76: +B19101_2_C_P1         Adj R² = 0.999503  (486 vars)\n",
      "  [BEV Count         ] Step  77: +B17020_3_P2           Adj R² = 0.999582  (493 vars)\n",
      "  [BEV Count         ] Step  78: +B15002_3_C_P1         Adj R² = 0.999670  (508 vars)\n",
      "  [BEV Count         ] Step  79: +B08016_5_P1           Adj R² = 0.999764  (510 vars)\n",
      "  [BEV Count         ] Step  80: +B06010_9_C_P2         Adj R² = 0.999822  (517 vars)\n",
      "  [BEV Count         ] Step  81: +B08203_6_P2           Adj R² = 0.999848  (522 vars)\n",
      "  [BEV Count         ] Step  82: +B23027_8_P1           Adj R² = 0.999856  (524 vars)\n",
      "  [BEV Count         ] → 82 groups, 524 vars, Adj R² = 0.999856\n",
      "  [BEV Proportion    ] Step   1: +B15002_3_P2           Adj R² = 0.756107  (16 vars)\n",
      "  [BEV Proportion    ] Step   2: +B08124_2              Adj R² = 0.795924  (28 vars)\n",
      "  [BEV Proportion    ] Step   3: +B08124_6_P2           Adj R² = 0.817806  (34 vars)\n",
      "  [BEV Proportion    ] Step   4: +B25104_2_C_P1         Adj R² = 0.837444  (49 vars)\n",
      "  [BEV Proportion    ] Step   5: +B25132_3_P2           Adj R² = 0.854993  (55 vars)\n",
      "  [BEV Proportion    ] Step   6: +B06009_4_P2           Adj R² = 0.863752  (60 vars)\n",
      "  [BEV Proportion    ] Step   7: +B08016_4_P3           Adj R² = 0.870320  (62 vars)\n",
      "  [BEV Proportion    ] Step   8: +B25040_2              Adj R² = 0.876013  (71 vars)\n",
      "  [BEV Proportion    ] Step   9: +B15011_7              Adj R² = 0.880884  (74 vars)\n",
      "  [BEV Proportion    ] Step  10: +B05002_7_P3           Adj R² = 0.885444  (80 vars)\n",
      "  [BEV Proportion    ] Step  11: +B15002_3_C_P1         Adj R² = 0.889299  (95 vars)\n",
      "  [BEV Proportion    ] Step  12: +C24050_3              Adj R² = 0.893303  (108 vars)\n",
      "  [BEV Proportion    ] Step  13: +B08201_5_P2           Adj R² = 0.896960  (113 vars)\n",
      "  [BEV Proportion    ] Step  14: +B08203_4_C            Adj R² = 0.900783  (117 vars)\n",
      "  [BEV Proportion    ] Step  15: +B08202_3_P1           Adj R² = 0.903522  (119 vars)\n",
      "  [BEV Proportion    ] Step  16: +B08202_4              Adj R² = 0.906237  (122 vars)\n",
      "  [BEV Proportion    ] Step  17: +B15002_4_C_P1         Adj R² = 0.908814  (137 vars)\n",
      "  [BEV Proportion    ] Step  18: +B08006_13_P2          Adj R² = 0.911273  (142 vars)\n",
      "  [BEV Proportion    ] Step  19: +B06010_7_C_P2         Adj R² = 0.913616  (149 vars)\n",
      "  [BEV Proportion    ] Step  20: +B23027_2              Adj R² = 0.915857  (156 vars)\n",
      "  [BEV Proportion    ] Step  21: +B17020_3_C_P1         Adj R² = 0.918228  (162 vars)\n",
      "  [BEV Proportion    ] Step  22: +B15002_4_C_P2         Adj R² = 0.921138  (177 vars)\n",
      "  [BEV Proportion    ] Step  23: +C24050_7              Adj R² = 0.923221  (190 vars)\n",
      "  [BEV Proportion    ] Step  24: +B19101_2_C_P1         Adj R² = 0.925615  (205 vars)\n",
      "  [BEV Proportion    ] Step  25: +B05002_5_P1           Adj R² = 0.927250  (208 vars)\n",
      "  [BEV Proportion    ] Step  26: +C24050_3_P2           Adj R² = 0.928885  (221 vars)\n",
      "  [BEV Proportion    ] Step  27: +B08016_7_P1           Adj R² = 0.930627  (225 vars)\n",
      "  [BEV Proportion    ] Step  28: +B08016_10_P1          Adj R² = 0.932872  (227 vars)\n",
      "  [BEV Proportion    ] Step  29: +B08122_6              Adj R² = 0.934443  (230 vars)\n",
      "  [BEV Proportion    ] Step  30: +B19126_4              Adj R² = 0.935793  (232 vars)\n",
      "  [BEV Proportion    ] Step  31: +B06009_5_P1           Adj R² = 0.937316  (237 vars)\n",
      "  [BEV Proportion    ] Step  32: +B15011_10_P1          Adj R² = 0.938706  (242 vars)\n",
      "  [BEV Proportion    ] Step  33: +B05002_7              Adj R² = 0.940196  (248 vars)\n",
      "  [BEV Proportion    ] Step  34: +B25132_3_C_P1         Adj R² = 0.941633  (253 vars)\n",
      "  [BEV Proportion    ] Step  35: +B19081_1              Adj R² = 0.942818  (255 vars)\n",
      "  [BEV Proportion    ] Step  36: +B06012_6_P1           Adj R² = 0.944068  (258 vars)\n",
      "  [BEV Proportion    ] Step  37: +C24050_7_P2           Adj R² = 0.945251  (271 vars)\n",
      "  [BEV Proportion    ] Step  38: +B06010_3_C            Adj R² = 0.946879  (278 vars)\n",
      "  [BEV Proportion    ] Step  39: +B15011_6_P1           Adj R² = 0.947978  (283 vars)\n",
      "  [BEV Proportion    ] Step  40: +B15011_8              Adj R² = 0.949055  (288 vars)\n",
      "  [BEV Proportion    ] Step  41: +B25104_2_C            Adj R² = 0.950374  (303 vars)\n",
      "  [BEV Proportion    ] Step  42: +B05002_7_P1           Adj R² = 0.951659  (309 vars)\n",
      "  [BEV Proportion    ] Step  43: +B08202_5_C            Adj R² = 0.953006  (312 vars)\n",
      "  [BEV Proportion    ] Step  44: +B08012_4_C_P1         Adj R² = 0.954410  (323 vars)\n",
      "  [BEV Proportion    ] Step  45: +B08012_4_C            Adj R² = 0.956859  (334 vars)\n",
      "  [BEV Proportion    ] Step  46: +B08012_4_P2           Adj R² = 0.958274  (346 vars)\n",
      "  [BEV Proportion    ] Step  47: +B08006_10             Adj R² = 0.959828  (352 vars)\n",
      "  [BEV Proportion    ] Step  48: +B06009_6_P1           Adj R² = 0.961574  (357 vars)\n",
      "  [BEV Proportion    ] Step  49: +B08303_2_C_P1         Adj R² = 0.962971  (368 vars)\n",
      "  [BEV Proportion    ] Step  50: +B19019_2              Adj R² = 0.964439  (375 vars)\n",
      "  [BEV Proportion    ] Step  51: +B08017_9_P3           Adj R² = 0.966067  (377 vars)\n",
      "  [BEV Proportion    ] Step  52: +B08203_2_C            Adj R² = 0.967331  (385 vars)\n",
      "  [BEV Proportion    ] Step  53: +C24050_4_P1           Adj R² = 0.969097  (398 vars)\n",
      "  [BEV Proportion    ] Step  54: +B28011_3_P2           Adj R² = 0.970715  (402 vars)\n",
      "  [BEV Proportion    ] Step  55: +B08012_3_C            Adj R² = 0.972735  (413 vars)\n",
      "  [BEV Proportion    ] Step  56: +B15011_6              Adj R² = 0.975443  (418 vars)\n",
      "  [BEV Proportion    ] Step  57: +B25133_3_C_P1         Adj R² = 0.976775  (423 vars)\n",
      "  [BEV Proportion    ] Step  58: +B15011_1              Adj R² = 0.978354  (424 vars)\n",
      "  [BEV Proportion    ] Step  59: +B15011_6_P2           Adj R² = 0.979734  (429 vars)\n",
      "  [BEV Proportion    ] Step  60: +B23027_4              Adj R² = 0.980977  (431 vars)\n",
      "  [BEV Proportion    ] Step  61: +B05011_3_C            Adj R² = 0.982748  (437 vars)\n",
      "  [BEV Proportion    ] Step  62: +B08201_6_C_P1         Adj R² = 0.984141  (441 vars)\n",
      "  [BEV Proportion    ] Step  63: +B06010_3_C_P2         Adj R² = 0.986243  (448 vars)\n",
      "  [BEV Proportion    ] Step  64: +B15011_10_P3          Adj R² = 0.987482  (453 vars)\n",
      "  [BEV Proportion    ] Step  65: +B08201_3_C            Adj R² = 0.988567  (457 vars)\n",
      "  [BEV Proportion    ] Step  66: +B02001_2_P1           Adj R² = 0.989874  (464 vars)\n",
      "  [BEV Proportion    ] Step  67: +B06010_10             Adj R² = 0.991296  (466 vars)\n",
      "  [BEV Proportion    ] Step  68: +B17020_3_C_P2         Adj R² = 0.992365  (472 vars)\n",
      "  [BEV Proportion    ] Step  69: +B05002_8_P2           Adj R² = 0.993981  (478 vars)\n",
      "  [BEV Proportion    ] Step  70: +C24050_5_P2           Adj R² = 0.995323  (491 vars)\n",
      "  [BEV Proportion    ] Step  71: +B08124_8_P1           Adj R² = 0.996767  (497 vars)\n",
      "  [BEV Proportion    ] Step  72: +B08201_6              Adj R² = 0.997902  (502 vars)\n",
      "  [BEV Proportion    ] Step  73: +B15011_9_P3           Adj R² = 0.998966  (507 vars)\n",
      "  [BEV Proportion    ] Step  74: +B08018_4_P2           Adj R² = 0.999359  (509 vars)\n",
      "  [BEV Proportion    ] Step  75: +B08124_4              Adj R² = 0.999626  (515 vars)\n",
      "  [BEV Proportion    ] Step  76: +B08122_1              Adj R² = 0.999766  (516 vars)\n",
      "  [BEV Proportion    ] Step  77: +B08017_3_P2           Adj R² = 0.999903  (520 vars)\n",
      "  [BEV Proportion    ] Step  78: +B08014_4_C_P1         Adj R² = 0.999957  (525 vars)\n",
      "  [BEV Proportion    ] Step  79: +B08122_3_C            Adj R² = 0.999976  (527 vars)\n",
      "  [BEV Proportion    ] → 79 groups, 527 vars, Adj R² = 0.999976\n",
      "  [BEV per Capita    ] Step   1: +B15002_3_P2           Adj R² = 0.697796  (16 vars)\n",
      "  [BEV per Capita    ] Step   2: +B06010_7_P3           Adj R² = 0.738899  (24 vars)\n",
      "  [BEV per Capita    ] Step   3: +B25104_2_C_P1         Adj R² = 0.759448  (39 vars)\n",
      "  [BEV per Capita    ] Step   4: +B15002_3_C_P1         Adj R² = 0.778870  (54 vars)\n",
      "  [BEV per Capita    ] Step   5: +B23027_2_C_P1         Adj R² = 0.797046  (60 vars)\n",
      "  [BEV per Capita    ] Step   6: +B06009_6_P2           Adj R² = 0.811307  (65 vars)\n",
      "  [BEV per Capita    ] Step   7: +C24050_3              Adj R² = 0.822237  (78 vars)\n",
      "  [BEV per Capita    ] Step   8: +B08016_3_P2           Adj R² = 0.830660  (82 vars)\n",
      "  [BEV per Capita    ] Step   9: +B06010_7_C_P1         Adj R² = 0.835241  (89 vars)\n",
      "  [BEV per Capita    ] Step  10: +B25132_3_P2           Adj R² = 0.840633  (95 vars)\n",
      "  [BEV per Capita    ] Step  11: +B25040_2              Adj R² = 0.846168  (104 vars)\n",
      "  [BEV per Capita    ] Step  12: +B25132_3_C_P1         Adj R² = 0.850962  (109 vars)\n",
      "  [BEV per Capita    ] Step  13: +B19019_2              Adj R² = 0.854590  (116 vars)\n",
      "  [BEV per Capita    ] Step  14: +B19049_2              Adj R² = 0.858991  (120 vars)\n",
      "  [BEV per Capita    ] Step  15: +B08006_7_P2           Adj R² = 0.862557  (122 vars)\n",
      "  [BEV per Capita    ] Step  16: +B15011_8              Adj R² = 0.865562  (127 vars)\n",
      "  [BEV per Capita    ] Step  17: +B08301_2              Adj R² = 0.868822  (135 vars)\n",
      "  [BEV per Capita    ] Step  18: +B08203_4_C            Adj R² = 0.871899  (139 vars)\n",
      "  [BEV per Capita    ] Step  19: +B08202_2_C_P1         Adj R² = 0.874987  (146 vars)\n",
      "  [BEV per Capita    ] Step  20: +B15011_9_P1           Adj R² = 0.878013  (151 vars)\n",
      "  [BEV per Capita    ] Step  21: +B08006_10_P1          Adj R² = 0.880456  (157 vars)\n",
      "  [BEV per Capita    ] Step  22: +C15010_2              Adj R² = 0.883642  (162 vars)\n",
      "  [BEV per Capita    ] Step  23: +B08303_2_C            Adj R² = 0.886522  (173 vars)\n",
      "  [BEV per Capita    ] Step  24: +B17020_3_C_P1         Adj R² = 0.888978  (179 vars)\n",
      "  [BEV per Capita    ] Step  25: +B19126_3              Adj R² = 0.891663  (181 vars)\n",
      "  [BEV per Capita    ] Step  26: +B15011_8_P2           Adj R² = 0.894214  (186 vars)\n",
      "  [BEV per Capita    ] Step  27: +B15011_9_P2           Adj R² = 0.896406  (191 vars)\n",
      "  [BEV per Capita    ] Step  28: +B08016_10_P1          Adj R² = 0.898569  (193 vars)\n",
      "  [BEV per Capita    ] Step  29: +B25132_3              Adj R² = 0.901295  (199 vars)\n",
      "  [BEV per Capita    ] Step  30: +B06010_3_C            Adj R² = 0.903046  (206 vars)\n",
      "  [BEV per Capita    ] Step  31: +B08122_5_C_P2         Adj R² = 0.904887  (208 vars)\n",
      "  [BEV per Capita    ] Step  32: +B08201_3_P1           Adj R² = 0.906689  (213 vars)\n",
      "  [BEV per Capita    ] Step  33: +B01002_2              Adj R² = 0.908597  (215 vars)\n",
      "  [BEV per Capita    ] Step  34: +B05002_7_P3           Adj R² = 0.911403  (221 vars)\n",
      "  [BEV per Capita    ] Step  35: +B08203_3              Adj R² = 0.914073  (226 vars)\n",
      "  [BEV per Capita    ] Step  36: +B08006_7_P3           Adj R² = 0.916206  (228 vars)\n",
      "  [BEV per Capita    ] Step  37: +B19101_2_C            Adj R² = 0.918404  (243 vars)\n",
      "  [BEV per Capita    ] Step  38: +B08124_7_P1           Adj R² = 0.921708  (249 vars)\n",
      "  [BEV per Capita    ] Step  39: +B08124_8_P2           Adj R² = 0.924199  (255 vars)\n",
      "  [BEV per Capita    ] Step  40: +B19126_6              Adj R² = 0.926083  (257 vars)\n",
      "  [BEV per Capita    ] Step  41: +B02001_3              Adj R² = 0.927558  (259 vars)\n",
      "  [BEV per Capita    ] Step  42: +B19081_2              Adj R² = 0.929288  (263 vars)\n",
      "  [BEV per Capita    ] Step  43: +B06010_5_C            Adj R² = 0.931008  (270 vars)\n",
      "  [BEV per Capita    ] Step  44: +B06012_6_P1           Adj R² = 0.932551  (273 vars)\n",
      "  [BEV per Capita    ] Step  45: +B08016_7              Adj R² = 0.934197  (277 vars)\n",
      "  [BEV per Capita    ] Step  46: +B05011_3_C            Adj R² = 0.935582  (283 vars)\n",
      "  [BEV per Capita    ] Step  47: +B08203_4_C_P1         Adj R² = 0.936974  (287 vars)\n",
      "  [BEV per Capita    ] Step  48: +B17020_4_C_P1         Adj R² = 0.938657  (293 vars)\n",
      "  [BEV per Capita    ] Step  49: +B01002_1              Adj R² = 0.940319  (294 vars)\n",
      "  [BEV per Capita    ] Step  50: +B15011_9_P3           Adj R² = 0.941989  (299 vars)\n",
      "  [BEV per Capita    ] Step  51: +B15011_4_P3           Adj R² = 0.943734  (304 vars)\n",
      "  [BEV per Capita    ] Step  52: +B28011_3_P2           Adj R² = 0.945092  (308 vars)\n",
      "  [BEV per Capita    ] Step  53: +C24050_7_P1           Adj R² = 0.946595  (321 vars)\n",
      "  [BEV per Capita    ] Step  54: +C24050_2_P1           Adj R² = 0.949965  (339 vars)\n",
      "  [BEV per Capita    ] Step  55: +B25104_2_C            Adj R² = 0.952618  (354 vars)\n",
      "  [BEV per Capita    ] Step  56: +B15011_10             Adj R² = 0.954821  (359 vars)\n",
      "  [BEV per Capita    ] Step  57: +B06010_11_P2          Adj R² = 0.957001  (367 vars)\n",
      "  [BEV per Capita    ] Step  58: +B06010_11_C           Adj R² = 0.958678  (374 vars)\n",
      "  [BEV per Capita    ] Step  59: +B08016_10_P2          Adj R² = 0.960438  (376 vars)\n",
      "  [BEV per Capita    ] Step  60: +B17020_4_P2           Adj R² = 0.962404  (383 vars)\n",
      "  [BEV per Capita    ] Step  61: +C24050_6_P2           Adj R² = 0.964559  (396 vars)\n",
      "  [BEV per Capita    ] Step  62: +B08018_4              Adj R² = 0.966340  (398 vars)\n",
      "  [BEV per Capita    ] Step  63: +B08124_5_P1           Adj R² = 0.968288  (404 vars)\n",
      "  [BEV per Capita    ] Step  64: +B08201_6_C_P1         Adj R² = 0.970660  (408 vars)\n",
      "  [BEV per Capita    ] Step  65: +B15011_4_P2           Adj R² = 0.973083  (413 vars)\n",
      "  [BEV per Capita    ] Step  66: +B08203_5_C_P2         Adj R² = 0.974921  (417 vars)\n",
      "  [BEV per Capita    ] Step  67: +B25133_3_C_P2         Adj R² = 0.976045  (422 vars)\n",
      "  [BEV per Capita    ] Step  68: +B08124_8_P1           Adj R² = 0.977098  (428 vars)\n",
      "  [BEV per Capita    ] Step  69: +B06012_6_C            Adj R² = 0.978205  (430 vars)\n",
      "  [BEV per Capita    ] Step  70: +C24050_4              Adj R² = 0.979629  (443 vars)\n",
      "  [BEV per Capita    ] Step  71: +B08017_4_P1           Adj R² = 0.981082  (445 vars)\n",
      "  [BEV per Capita    ] Step  72: +B08017_4_P3           Adj R² = 0.982946  (447 vars)\n",
      "  [BEV per Capita    ] Step  73: +B08201_6              Adj R² = 0.984778  (452 vars)\n",
      "  [BEV per Capita    ] Step  74: +B06010_8_P1           Adj R² = 0.985718  (454 vars)\n",
      "  [BEV per Capita    ] Step  75: +B08202_5_C_P2         Adj R² = 0.986948  (457 vars)\n",
      "  [BEV per Capita    ] Step  76: +C24050_6_P1           Adj R² = 0.988265  (470 vars)\n",
      "  [BEV per Capita    ] Step  77: +B15011_4_P1           Adj R² = 0.989653  (475 vars)\n",
      "  [BEV per Capita    ] Step  78: +B19101_2_C_P1         Adj R² = 0.991358  (490 vars)\n",
      "  [BEV per Capita    ] Step  79: +B08006_10             Adj R² = 0.992723  (496 vars)\n",
      "  [BEV per Capita    ] Step  80: +B08124_7              Adj R² = 0.995008  (502 vars)\n",
      "  [BEV per Capita    ] Step  81: +B05011_3_C_P2         Adj R² = 0.996080  (508 vars)\n",
      "  [BEV per Capita    ] Step  82: +B08018_3_P1           Adj R² = 0.996835  (510 vars)\n",
      "  [BEV per Capita    ] Step  83: +B08203_6              Adj R² = 0.998059  (515 vars)\n",
      "  [BEV per Capita    ] Step  84: +B08017_6_P2           Adj R² = 0.999047  (517 vars)\n",
      "  [BEV per Capita    ] Step  85: +B06012_5_P1           Adj R² = 0.999529  (520 vars)\n",
      "  [BEV per Capita    ] Step  86: +B08017_8_P1           Adj R² = 0.999781  (522 vars)\n",
      "  [BEV per Capita    ] Step  87: +B23027_7              Adj R² = 0.999927  (524 vars)\n",
      "  [BEV per Capita    ] Step  88: +B25018_1              Adj R² = 0.999951  (525 vars)\n",
      "  [BEV per Capita    ] Step  89: +B05002_6_P1           Adj R² = 0.999981  (527 vars)\n",
      "  [BEV per Capita    ] → 89 groups, 527 vars, Adj R² = 0.999981\n",
      "  Saved Results_micro_individual.csv  (250 rows)\n",
      "\n",
      "=================================================================\n",
      "  Grouping: micro (702 groups)  |  Mode: aggregate\n",
      "=================================================================\n",
      "  Step   1: +B15002_3              Mean = 0.660979  (16 vars)  (BEV Count: 0.9448  BEV Proportion: 0.5566  BEV per Capita: 0.4816)\n",
      "  Step   2: +B15002_3_P2           Mean = 0.814868  (32 vars)  (BEV Count: 0.9446  BEV Proportion: 0.7797  BEV per Capita: 0.7202)\n",
      "  Step   3: +B25104_2_C_P1         Mean = 0.836386  (47 vars)  (BEV Count: 0.9499  BEV Proportion: 0.8070  BEV per Capita: 0.7522)\n",
      "  Step   4: +B25132_3_P2           Mean = 0.853233  (53 vars)  (BEV Count: 0.9501  BEV Proportion: 0.8404  BEV per Capita: 0.7692)\n",
      "  Step   5: +B06010_7              Mean = 0.864696  (61 vars)  (BEV Count: 0.9577  BEV Proportion: 0.8493  BEV per Capita: 0.7870)\n",
      "  Step   6: +B15002_3_C_P1         Mean = 0.873980  (76 vars)  (BEV Count: 0.9569  BEV Proportion: 0.8622  BEV per Capita: 0.8028)\n",
      "  Step   7: +B23027_2_C_P1         Mean = 0.879181  (82 vars)  (BEV Count: 0.9564  BEV Proportion: 0.8665  BEV per Capita: 0.8146)\n",
      "  Step   8: +B25040_2              Mean = 0.884733  (91 vars)  (BEV Count: 0.9628  BEV Proportion: 0.8702  BEV per Capita: 0.8212)\n",
      "  Step   9: +B06009_6_P2           Mean = 0.888740  (96 vars)  (BEV Count: 0.9633  BEV Proportion: 0.8733  BEV per Capita: 0.8297)\n",
      "  Step  10: +B08016_3_P2           Mean = 0.892716  (100 vars)  (BEV Count: 0.9639  BEV Proportion: 0.8777  BEV per Capita: 0.8366)\n",
      "  Step  11: +C24050_3              Mean = 0.897287  (113 vars)  (BEV Count: 0.9667  BEV Proportion: 0.8842  BEV per Capita: 0.8410)\n",
      "  Step  12: +B15011_8              Mean = 0.900681  (118 vars)  (BEV Count: 0.9679  BEV Proportion: 0.8904  BEV per Capita: 0.8438)\n",
      "  Step  13: +B06010_7_C_P1         Mean = 0.903984  (125 vars)  (BEV Count: 0.9682  BEV Proportion: 0.8948  BEV per Capita: 0.8490)\n",
      "  Step  14: +B25132_3              Mean = 0.907360  (131 vars)  (BEV Count: 0.9712  BEV Proportion: 0.8966  BEV per Capita: 0.8543)\n",
      "  Step  15: +B06010_7_P3           Mean = 0.910825  (139 vars)  (BEV Count: 0.9711  BEV Proportion: 0.9023  BEV per Capita: 0.8590)\n",
      "  Step  16: +B08303_2_C            Mean = 0.913497  (150 vars)  (BEV Count: 0.9733  BEV Proportion: 0.9050  BEV per Capita: 0.8622)\n",
      "  Step  17: +B06009_4              Mean = 0.915379  (155 vars)  (BEV Count: 0.9774  BEV Proportion: 0.9053  BEV per Capita: 0.8634)\n",
      "  Step  18: +B08016_10_P1          Mean = 0.917262  (157 vars)  (BEV Count: 0.9775  BEV Proportion: 0.9070  BEV per Capita: 0.8673)\n",
      "  Step  19: +B08201_5_C_P2         Mean = 0.919600  (161 vars)  (BEV Count: 0.9776  BEV Proportion: 0.9097  BEV per Capita: 0.8715)\n",
      "  Step  20: +B19049_2              Mean = 0.921586  (165 vars)  (BEV Count: 0.9776  BEV Proportion: 0.9113  BEV per Capita: 0.8759)\n",
      "  Step  21: +B08014_3              Mean = 0.923650  (171 vars)  (BEV Count: 0.9789  BEV Proportion: 0.9125  BEV per Capita: 0.8795)\n",
      "  Step  22: +C24050_7_P1           Mean = 0.925411  (184 vars)  (BEV Count: 0.9787  BEV Proportion: 0.9155  BEV per Capita: 0.8820)\n",
      "  Step  23: +B06010_9              Mean = 0.927194  (192 vars)  (BEV Count: 0.9801  BEV Proportion: 0.9170  BEV per Capita: 0.8845)\n",
      "  Step  24: +B25133_3_C_P1         Mean = 0.929255  (197 vars)  (BEV Count: 0.9800  BEV Proportion: 0.9200  BEV per Capita: 0.8877)\n",
      "  Step  25: +B08017_3              Mean = 0.930917  (201 vars)  (BEV Count: 0.9801  BEV Proportion: 0.9221  BEV per Capita: 0.8906)\n",
      "  Step  26: +B15002_4_C            Mean = 0.932374  (216 vars)  (BEV Count: 0.9817  BEV Proportion: 0.9227  BEV per Capita: 0.8927)\n",
      "  Step  27: +B15002_4_C_P1         Mean = 0.934091  (231 vars)  (BEV Count: 0.9820  BEV Proportion: 0.9229  BEV per Capita: 0.8973)\n",
      "  Step  28: +B19019_2              Mean = 0.936430  (238 vars)  (BEV Count: 0.9818  BEV Proportion: 0.9250  BEV per Capita: 0.9025)\n",
      "  Step  29: +B15011_10_P2          Mean = 0.938133  (243 vars)  (BEV Count: 0.9818  BEV Proportion: 0.9273  BEV per Capita: 0.9052)\n",
      "  Step  30: +B17020_3_C_P1         Mean = 0.939793  (249 vars)  (BEV Count: 0.9819  BEV Proportion: 0.9291  BEV per Capita: 0.9084)\n",
      "  Step  31: +B19126_3              Mean = 0.941424  (251 vars)  (BEV Count: 0.9820  BEV Proportion: 0.9314  BEV per Capita: 0.9108)\n",
      "  Step  32: +B25132_3_C_P1         Mean = 0.942767  (256 vars)  (BEV Count: 0.9819  BEV Proportion: 0.9326  BEV per Capita: 0.9138)\n",
      "  Step  33: +B25104_2_C            Mean = 0.944270  (271 vars)  (BEV Count: 0.9850  BEV Proportion: 0.9331  BEV per Capita: 0.9146)\n",
      "  Step  34: +B19101_2_C            Mean = 0.946116  (286 vars)  (BEV Count: 0.9860  BEV Proportion: 0.9354  BEV per Capita: 0.9170)\n",
      "  Step  35: +C24050_4_P1           Mean = 0.947301  (299 vars)  (BEV Count: 0.9857  BEV Proportion: 0.9376  BEV per Capita: 0.9186)\n",
      "  Step  36: +B08012_3_C_P2         Mean = 0.949003  (310 vars)  (BEV Count: 0.9859  BEV Proportion: 0.9387  BEV per Capita: 0.9224)\n",
      "  Step  37: +B08124_8              Mean = 0.950369  (316 vars)  (BEV Count: 0.9870  BEV Proportion: 0.9401  BEV per Capita: 0.9240)\n",
      "  Step  38: +B06010_3              Mean = 0.951773  (324 vars)  (BEV Count: 0.9880  BEV Proportion: 0.9416  BEV per Capita: 0.9257)\n",
      "  Step  39: +B06012_6_P2           Mean = 0.953013  (327 vars)  (BEV Count: 0.9881  BEV Proportion: 0.9431  BEV per Capita: 0.9279)\n",
      "  Step  40: +B08016_1              Mean = 0.954623  (328 vars)  (BEV Count: 0.9882  BEV Proportion: 0.9454  BEV per Capita: 0.9303)\n",
      "  Step  41: +B08016_10_P2          Mean = 0.956111  (330 vars)  (BEV Count: 0.9883  BEV Proportion: 0.9475  BEV per Capita: 0.9325)\n",
      "  Step  42: +B08006_7_P3           Mean = 0.957601  (332 vars)  (BEV Count: 0.9883  BEV Proportion: 0.9492  BEV per Capita: 0.9352)\n",
      "  Step  43: +B08012_3_C            Mean = 0.959377  (343 vars)  (BEV Count: 0.9899  BEV Proportion: 0.9505  BEV per Capita: 0.9377)\n",
      "  Step  44: +B05002_7              Mean = 0.961181  (349 vars)  (BEV Count: 0.9903  BEV Proportion: 0.9530  BEV per Capita: 0.9402)\n",
      "  Step  45: +C17002_2_P1           Mean = 0.963059  (356 vars)  (BEV Count: 0.9904  BEV Proportion: 0.9548  BEV per Capita: 0.9440)\n",
      "  Step  46: +C24050_3_P2           Mean = 0.964501  (369 vars)  (BEV Count: 0.9904  BEV Proportion: 0.9565  BEV per Capita: 0.9466)\n",
      "  Step  47: +B08203_3_P1           Mean = 0.965829  (374 vars)  (BEV Count: 0.9902  BEV Proportion: 0.9595  BEV per Capita: 0.9478)\n",
      "  Step  48: +B08006_13_P2          Mean = 0.967259  (379 vars)  (BEV Count: 0.9901  BEV Proportion: 0.9609  BEV per Capita: 0.9508)\n",
      "  Step  49: +B08201_3              Mean = 0.968840  (384 vars)  (BEV Count: 0.9904  BEV Proportion: 0.9625  BEV per Capita: 0.9537)\n",
      "  Step  50: +B23027_2              Mean = 0.970058  (391 vars)  (BEV Count: 0.9915  BEV Proportion: 0.9636  BEV per Capita: 0.9550)\n",
      "  Step  51: +B08017_6_P2           Mean = 0.971012  (393 vars)  (BEV Count: 0.9920  BEV Proportion: 0.9640  BEV per Capita: 0.9570)\n",
      "  Step  52: +B05002_5_P1           Mean = 0.972006  (396 vars)  (BEV Count: 0.9921  BEV Proportion: 0.9651  BEV per Capita: 0.9588)\n",
      "  Step  53: +B08124_3              Mean = 0.973227  (402 vars)  (BEV Count: 0.9928  BEV Proportion: 0.9669  BEV per Capita: 0.9601)\n",
      "  Step  54: +B08124_8_P1           Mean = 0.974399  (408 vars)  (BEV Count: 0.9928  BEV Proportion: 0.9684  BEV per Capita: 0.9620)\n",
      "  Step  55: +C24050_5              Mean = 0.975648  (421 vars)  (BEV Count: 0.9926  BEV Proportion: 0.9708  BEV per Capita: 0.9635)\n",
      "  Step  56: +B28011_3_P2           Mean = 0.977165  (425 vars)  (BEV Count: 0.9929  BEV Proportion: 0.9723  BEV per Capita: 0.9662)\n",
      "  Step  57: +B15011_4              Mean = 0.978565  (430 vars)  (BEV Count: 0.9932  BEV Proportion: 0.9747  BEV per Capita: 0.9678)\n",
      "  Step  58: +B08203_6_C_P2         Mean = 0.980028  (434 vars)  (BEV Count: 0.9932  BEV Proportion: 0.9761  BEV per Capita: 0.9707)\n",
      "  Step  59: +B08124_7_P2           Mean = 0.981361  (440 vars)  (BEV Count: 0.9932  BEV Proportion: 0.9782  BEV per Capita: 0.9727)\n",
      "  Step  60: +B17020_3              Mean = 0.982903  (447 vars)  (BEV Count: 0.9934  BEV Proportion: 0.9803  BEV per Capita: 0.9750)\n",
      "  Step  61: +B08017_5_P2           Mean = 0.984457  (449 vars)  (BEV Count: 0.9935  BEV Proportion: 0.9828  BEV per Capita: 0.9771)\n",
      "  Step  62: +B15011_5_P1           Mean = 0.985960  (454 vars)  (BEV Count: 0.9933  BEV Proportion: 0.9846  BEV per Capita: 0.9801)\n",
      "  Step  63: +B08017_10_P3          Mean = 0.987821  (456 vars)  (BEV Count: 0.9939  BEV Proportion: 0.9856  BEV per Capita: 0.9839)\n",
      "  Step  64: +B08301_4_P1           Mean = 0.989244  (461 vars)  (BEV Count: 0.9938  BEV Proportion: 0.9873  BEV per Capita: 0.9866)\n",
      "  Step  65: +B06010_9_C_P2         Mean = 0.990494  (468 vars)  (BEV Count: 0.9935  BEV Proportion: 0.9888  BEV per Capita: 0.9891)\n",
      "  Step  66: +B06010_11_P3          Mean = 0.991403  (476 vars)  (BEV Count: 0.9934  BEV Proportion: 0.9899  BEV per Capita: 0.9909)\n",
      "  Step  67: +B08006_5_P1           Mean = 0.992482  (481 vars)  (BEV Count: 0.9938  BEV Proportion: 0.9906  BEV per Capita: 0.9931)\n",
      "  Step  68: +C24050_5_P1           Mean = 0.993513  (494 vars)  (BEV Count: 0.9946  BEV Proportion: 0.9923  BEV per Capita: 0.9937)\n",
      "  Step  69: +C24050_5_P2           Mean = 0.994645  (507 vars)  (BEV Count: 0.9954  BEV Proportion: 0.9931  BEV per Capita: 0.9955)\n",
      "  Step  70: +B28010_3_P1           Mean = 0.995816  (509 vars)  (BEV Count: 0.9956  BEV Proportion: 0.9956  BEV per Capita: 0.9962)\n",
      "  Step  71: +B08016_10             Mean = 0.996870  (511 vars)  (BEV Count: 0.9961  BEV Proportion: 0.9977  BEV per Capita: 0.9967)\n",
      "  Step  72: +B19101_2_C_P1         Mean = 0.999293  (526 vars)  (BEV Count: 0.9983  BEV Proportion: 0.9996  BEV per Capita: 0.9999)\n",
      "  Step  73: +B01002_1              Mean = 0.999621  (527 vars)  (BEV Count: 0.9994  BEV Proportion: 0.9996  BEV per Capita: 0.9999)\n",
      "  → 73 groups, 527 vars, Mean Adj R² = 0.999621\n",
      "  Saved Results_micro_aggregate.csv  (73 rows)\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "for grouping in GROUPING_LEVELS:\n",
    "    grp_map = grouping_maps[grouping]\n",
    "\n",
    "    for mode in RESPONSE_MODES:\n",
    "        filename = f\"Results_{grouping}_{mode}.csv\"\n",
    "        if os.path.exists(filename):\n",
    "            print(f\"SKIP  {filename}  (already exists)\", flush=True)\n",
    "            continue\n",
    "\n",
    "        print(f\"\\n{'='*65}\", flush=True)\n",
    "        print(f\"  Grouping: {grouping} ({len(grp_map)} groups)  |  Mode: {mode}\", flush=True)\n",
    "        print(f\"{'='*65}\", flush=True)\n",
    "\n",
    "        if mode == \"individual\":\n",
    "            all_rows: list[dict] = []\n",
    "            for resp in RESPONSES:\n",
    "                y = y_vectors[resp]\n",
    "                hist = []\n",
    "                for row in forward_selection_individual(X_full, y, grp_map, MAX_ITER):\n",
    "                    hist.append(row)\n",
    "                    all_rows.append({\"Response\": resp, **row})\n",
    "                    print(f\"  [{resp:18s}] Step {row['Step']:>3d}: \"\n",
    "                          f\"+{row['Group Added']:20s}  \"\n",
    "                          f\"Adj R² = {row['Adj R²']:.6f}  \"\n",
    "                          f\"({row['Cumul. Predictors']} vars)\", flush=True)\n",
    "                if hist:\n",
    "                    print(f\"  [{resp:18s}] → {len(hist)} groups, \"\n",
    "                          f\"{hist[-1]['Cumul. Predictors']} vars, \"\n",
    "                          f\"Adj R² = {hist[-1]['Adj R²']:.6f}\", flush=True)\n",
    "                else:\n",
    "                    print(f\"  [{resp:18s}] → no groups selected\", flush=True)\n",
    "            results = pl.DataFrame(all_rows)\n",
    "\n",
    "        else:  # aggregate\n",
    "            hist = []\n",
    "            for row in forward_selection_aggregate(X_full, y_vectors, grp_map, MAX_ITER):\n",
    "                hist.append(row)\n",
    "                per_resp = \"  \".join(\n",
    "                    f\"{r}: {row.get(f'Adj R² ({r})', 0):.4f}\"\n",
    "                    for r in RESPONSES\n",
    "                )\n",
    "                print(f\"  Step {row['Step']:>3d}: +{row['Group Added']:20s}  \"\n",
    "                      f\"Mean = {row['Mean Adj R²']:.6f}  \"\n",
    "                      f\"({row['Cumul. Predictors']} vars)  \"\n",
    "                      f\"({per_resp})\", flush=True)\n",
    "            if hist:\n",
    "                print(f\"  → {len(hist)} groups, \"\n",
    "                      f\"{hist[-1]['Cumul. Predictors']} vars, \"\n",
    "                      f\"Mean Adj R² = {hist[-1]['Mean Adj R²']:.6f}\", flush=True)\n",
    "            else:\n",
    "                print(f\"  → no groups selected\", flush=True)\n",
    "            results = pl.DataFrame(hist)\n",
    "\n",
    "        results.write_csv(filename)\n",
    "        print(f\"  Saved {filename}  ({results.height} rows)\", flush=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
